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Two Nineteenth Century Views 


The integral fo e~" du is so frequently used that it is convenient to have a separate notation 


for it ... From its uses in physics, it will be evident that [the function ii e-" du] may fairly 


claim at present to rank in importance next to the trigonometrical and logarithmic functions. 
—The English mathematician J. W. L. Glaisher (1848-1928) writing in an 1871 issue of 
The Philosophical Magazine on what he proposed to call the error function. 


A mathematician is someone to whom te eX dx= 3 Tis as obvious as twice two is four 
is to you. 

—A quote (perhaps apocryphal) said to be the words of Professor William Thomson 
(1824-1907), aka Lord Kelvin, to a class of almost surely perplexed students. 


The electrical behavior of the nineteenth century trans-Atlantic telegraph cable 
(mathematically studied by William Thomson in the 1850s) leads to the error 
function, and that particular example was certainly in Glaisher’s mind when he 
wrote in 1871 (see Chap. 9). 


(continued) 
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Two Nineteenth Century Views 


Fig. 1 Pierre Simon Laplace, said to be the first (1774)—but was he?—to calculate the 
probability integral. (Image courtesy of AIP Emilio Segré Visual Archives, Physics Today 
Collection) 


(continued) 


Two Nineteenth Century Views 


Fig. 2 Abraham de Moivre, the first (1733) to know the value of the probability integral. 
(Image reproduced under license from Alamy, Inc) 
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Preface 


It is natural to an engineer to ask [a mathematician] for a finite formula for f e-* dx... If 
we fail to satisfy him it is not because of our stupidity, but because the world does not happen 
to have been made that way. 

—The great English mathematician G. H. Hardy (1877-1947), in a frank admission that 
not all integrals are ‘doable’! 


Hardy did not just randomly pick his example of an integral that is impossible to 
evaluate. He picked it because [ e-* dx isa particularly frustrating example of a 
more general situation that puzzles just about every student when beginning the 
study of calculus. While every last one of the fundamental functions (powers, 
logarithms, trigonometric, exponential, and any and all combinations of those 
functions) are differentiable in a finite number of steps, that is definitely not the 
case when it comes to integration. Differentiation appears to be a sequential process 
of elementary steps (perhaps sometimes exceedingly grubby steps, yes, but none- 
theless well-defined steps) that can be relentlessly, even robotically followed as long 
as one has sufficient endurance. But the same is not so when going in the opposite 
direction. 

When I talk about integration in this book, I am talking about the Riemann 
interpretation—after the German mathematician G. F. B. Riemann (1826—1866)— 
which is the interpretation taught in all college freshman calculus courses and 
advanced placement (AP) high school classes. (There are other types of integrals 
and interpretations in mathematics, none of which occur in this book.) The physical 
meaning of the Riemann integral is that [Pe (x)dx is the area bounded by f(x) and the 
X-axis, as X varies from a to b > a, with the understanding that if f(x) > 0 over some 
interval of x the bounded area is positive, while if f(x) < 0 over some interval of x the 
bounded area is negative. The area interpretation leads to what mathematicians call 


'From Hardy’s talk “An Introduction to the Theory of Numbers,” delivered in New York City in 
December 1928 as the 6th Gibbs Lecture (published in the Bulletin of the American Mathematical 
Society, November-December 1929, pp. 778-818). 
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the fundamental theorem of calculus, and we will use it in all that follows in this 
book. Here is how it goes. 
We start by defining 


I(x) = / f(t)dt. (1) 
Then, from the definition of the derivative, 


Og Oe | ew e(t)at — | co Te e(t)at. 


dx «50 € e>0€ e>0€ J, 


From the mean-value theorem of calculus (which I'll take to be intuitively obvious), 
there is some value of t (call it z) between x and x + € such that 


[twa =ef(t). 


That is, the two expressions on each side of the equals sign measure the same area. 
Thus, 


dl 1 : 
a ~ tim, gefle) = lim, (0) 


When e — 0, we have x + € — x and since T is between x and x + €, we see that 
t — x. So, if f(x) is continuous (an assumption that we’ll use throughout this book), 
then f(t) — f(x) and we have 


dl 


eo 


Integrating indefinitely, 


I(x) = J f)ax=F0) +C (2) 


where C is some constant and F(x) is such that 


f(xy= &. (3) 


That is, F(x) is the so-called anti-derivative of f(x). We can eliminate the constant C 
by setting x = a in (2) to write 
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and so 
I(x) — I(a) = [F(x) + C] — [F(a) + C] = F(x) — F(a). 


But, by the area interpretation, we have 
and so 


Finally, setting x = b we at last arrive at the fundamental theorem of calculus that I 
mentioned earlier: 


/ "e(t)at =F(b) — F(a) 


where F(x) is such that (3) holds. 

Now, while every fundamental function has a derivative that one can look-up in 
tables that have been generated over the years and decades and centuries that the 
differential calculus has been studied, not every function has an “anti-derivative.” 
Given a fundamental function (or a composite function built-up from the fundamen- 
tal ones) that we’ll call F(x), one can always compute (in a finite number of steps) an 
f(x) such that 


d 
ae F(x) =f(x) 


but (and seemingly perversely) there is an infinity of easy-to-write-down functions f 
(x) such that there is no F(x) satisfying 


[tes dx = F(x). 


One such f(x) is the object of study in this book, Hardy’s e~ x’ and what makes this 
particular situation really frustrating is that this particular f(x) appears as the inte- 
grand (or as part of the integrand) in a multitude of integrals that arise in important 
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physical problems of interest to physicists and engineers, as well as in theoretical 
analyses of considerable interest to mathematicians.” To make matters even more 
perplexing, it turns out that even though there is no F(x) for f(x) =e7 *’__and so the 
indefinite integral can’t be written down*—it nevertheless is possible to compute the 
definite integral for the particular limits of minus infinity (or zero) to plus infinity. 
This state of affairs has been described as follows: “Confusion is caused 
[in students] by the revelation that some integrands without [anti-derivatives] can 
nevertheless be integrated between certain limits. ... Methods for performing such 
integrations are rarely mentioned in standard courses . . . as they call not for parrot- 
style regurgitation of standard integration rules, but some real mathematical insight 
and lateral thinking [my emphasis].”* A dramatic example of this is that since 1733 
it has been known that [ ree dx = ,/z. Since e ~* is an even function, it is also 
true that ie dx = 5 /m, and a simple change of variable shows. that 
f ae 2" dx = Jon. I'll feel free to move from one or another of these expressions 
as the book progresses, with the choice at the moment depending on convenience. 
So, we come to one immediately obvious question: if there is no “anti-derivative” 
for e~* , where does the ,/min those formulas in the previous paragraph come from? 
As one writer put this, “Students who learn probability ... are frequently surprised 
by the fact that [the probability integral] contains a factor of \/2z. Since x is the ratio 
of the circumference of a circle to its diameter, where [they wonder] is the circle?” 
When I read that, I gave a nostalgic nod in agreement. Indeed, just the year before, in 
the first chapter of my then recently published book Inside Interesting Integrals 


(Springer 2015), I had discussed the integral [ - 1+xdx =. I first came across 


1L=x 

this integral (from the lifting theory of aerodynamics) decades earlier, when taking a 
summer-school class in calculus before my senior year in high school. In my book I 
wrote “I of course knew even in those long-ago days [of 1957] that x was intimately 
connected to circles, but I didn’t see any circles [in the integral].” I suspect this same 
puzzling question occurs to numerous students on a regular basis. 

So, again, how does ,/z appear in the probability integral? Read on, and you’ll 
find out—fifteen different ways! 

But before we get started, let me say a few words on how I imagine the audience 
for this book. Broadly speaking, with two exceptions (that Ill elaborate on in just a 


?To list a few historical examples: the analytic theory of heat developed in the early 1800s, the 
theory of transmission lines (the trans-Atlantic telegraph cable in the mid-nineteenth century), and 
information and communication theory (which is saturated in probabilistic arguments) developed 
during the twentieth century. 

In 1833 the French mathematician Joseph Liouville (1809-1882) explained why the world is, to 
use Hardy’s phrase, “not made that way,” and mid-way through this book (Chap. 8) P’ll show you 
how Liouville did that. 

4AD. Fitt, “What They Don’t Teach You About Integration at School,” The Mathematical Gazette, 
March 1988, pp. 11-15. 

> Hsuan-Chi Chen, “Newton’s Semicircle and the Probability Integral,” The Mathematical Gazette, 
July 2016, pp. 317-321. 
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bit) if you have completed a high school AP-calculus course you should be able to 
understand 80% of this book, with the other 20% coming after some personal study 
on the particular issues that are causing difficulty. These issues will be different for 
different readers, but I have attempted to anticipate as many of these concerns as 
possible and have added eight technical appendices that I think will address most of 
those problem areas. The appendices each contain detailed developments of topics 
usually left to advanced college courses (there isn’t room for everything in the first 
year!), but since the math itself is still just freshman calculus I have decided not to 
simply quote results, but to go a step further and derive them. 

There is a great deal of such supporting math in the appendices, math that is said 
to be “advanced” only because it often comes after the freshman (or sophomore) 
years in college. If you read this book, then in addition to learning about the specifics 
of the probability integral, you'll also see a lot of related mathematics that is 
important in its own right, and that’s a pretty good side benefit. 

If you have actually finished the first full year of an undergraduate college 
calculus sequence, then there should be nothing in this book that will stop you 
(except, perhaps, for those two exceptions). In any case, this book is intended to be 
entirely self-contained. 

To be a bit more specific about the level at which this book is written, here’s an 
example of what to expect. Calculate the value of 


im, ee "dx=? 
Ifn = | the integral is the probability integral (the whole point of this book), and as n 
goes off to infinity the integral seems to grow ever more terrifying. In fact, we can 
answer this question almost in our heads. Start by writing 


oO 
_y2n _ y2n _ x2n 
[ eta} eax f e * dx. 
=o |x| <1 |x| >1 


For Ixl < 1, as n > o we see that x" _, 0 and soe~* — 1. And for Ix! > 1, as 
n— oo we see that x" > +00 andsoe~* —0. Therefore, as n — oo the integrand 
becomes | for —1 < x < 1 and 0 for all other x. That is, the integrand looks like 
Fig. Pl and so, from the area interpretation of the Riemann integral (look at the 
shaded area), we instantly see that 


Fig. P1_ What e~*” looks , Integrand 


like in the limit n > oo ; 
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Fig. P2. The Wolfram integral value 
integrator at work 1.77245 


1,81280 


10 1.94701 
100 1.99428 


1.85544 


aa 2n 
lim e * dx=2. 
n—oo 
— oo 
In making this conclusion, I’ve reversed the order of the limit and the integration, 
to argue the assumption 


im, eo" dx= / jim e~ x ax. 
I write assumption because such a reversal isn’t always legal (see note 3 in Chap. 2), 
but I generally brush such concerns aside in this book. Rather, my philosophy on 


“doing math” is in-line with these words, written by two MIT mathematicians: 


The lack of a rigorous proof should never be an obstacle to constructive analysis. In fact, 
without [math experiments] there would be very little grist for the mill of rigor. As long as 
the results ... make sense and can be tested [my emphasis] ad hoc methods can and should 
be used freely.° 


This was actually a fairly radical position for a mathematician to take 50 years ago, 
but one that is far more acceptable today. Now, none of what I’ve just written should 
be taken as an attack on mathematical rigor. Rigor is generally good but it is not, as 
the Nobel-prize-winning physicist Richard Feynman (1918-1988) liked to say, 
when taken to the point that rigor mortis sets-in!’ In a book like this one, a collection 
of introductory essays with a strong historical flavor to them, following the tradi- 
tional fashion in math books of citing theorems and lemmas at every turn, is surely 
not going to excite the imaginations of my imagined audience. I think most engineers 
and physicists will go along with me on this, and I can only hope that ultra-pure 
mathematicians will forgive me. 

In the particular example I just gave, testing the assumption is easy: we’ll just 
numerically evaluate the integral for progressively increasing values of n and see 
what happens (I used the Wolfram on-line integrator to do this). The results are 
shown in Fig. P2. 


°Harvey P. Greenspan and David J. Benney, Calculus: an introduction to applied mathematics (2nd 
edition), McGraw-Hill 1973, p. 122. 

7Feynman would have had little sympathy for the position of Lemuel Falk, the Russian mathema- 
tician in Robert Littell’s hilarious 1994 novel The Visiting Professor. Professor Falk (famous for his 
study on the randomness of the digits in the decimal expansion of pi) was so cautious that his mantra 
was “two plus two appears to be four.” 
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The idea that what is usually thought of as being the purest of academic 
studies, mathematics, has fashions that come-and-go is often a shocking 
revelation for students. Here is another one. As the Second World War reached 
its end, the high-speed electronic computer came into existence in the form of 
the ENIAC (Electronic Numerical Integrator And Computer). In the years that 
followed, ever-more sophisticated machines were built, with one of the more 
famous being the MANIAC (Mathematical Analyzer, Numerical Integrator 
And Computer) that helped make important advances in answering questions 
concerning atomic physics. One of the early fans of the MANIAC was John 
von Neumann (1903-1957), a mathematician at the Institute for Advanced 
Study (IAS) in Princeton, NJ, who decided he wanted to build an even more 
powerful computer. The IAS computer project, however, almost immediately 
encountered severe criticism from von Neumann’s IAS mathematician col- 
leagues, who felt actually building something to be an exercise in “mere 
engineering” and thus inappropriate for the Institute. In the end, von Neumann 
did prevail, but low-level grumbling persisted for quite a while. Today, of 
course, computers are literally everywhere, including most of the offices of the 
IAS. One amused observer of this curious struggle over what was “fashion- 
able” in mathematics was the mathematical physicist Mark Kac (1914-1984) 
who was then (1951-1952) at the IAS as a visiting scholar. Years later, 
thinking back on those events, Kac wrote (in his autobiography Enigmas of 
Chance) these brief but to-the-point words about the IAS computer affair: 
“Sounds silly in retrospect.” 


Now, what about those two exceptions I mentioned earlier, in my claim that 
students at the high school AP-calculus and college freshman level can read most of 
this book? Those exceptions occur at the end of the book, in the discussions of 
Cauchy’s and Fourier’s—after the French mathematicians Augustin-Louis Cauchy 
(1789-1857) and Joseph Fourier (1768—1830)—imagined derivations of the proba- 
bility integral. Contour integrals and Fourier transforms are, of course, definitely not 
part of either AP-calculus or college freshman calculus, so how do I justify 
Chaps. 12 and 13? 

I have two reasons for including Chap. 12. First, it is supported by Appendix F, 
where I show you enough complex function theory to allow you to follow the 
arguments in Chap. 12. And second, for the historical reason that, for quite a long 
time, it was commonly believed that Cauchy’s technique simply could not handle the 
probability integral. Cauchy, himself, believed this. And then, one day (to the 
astonishment of many), it was found that it could. Now that’s just too good a story 
to leave out of the book! As for Chap. 13, it is included both because the math is so 
beautiful and, of course, to make our story complete. 

Earlier I made the claim that reading this book would bring a lot of interesting 
math into play, in addition to the probability integral, itself. As my final technical 
calculation of this opening essay, here is a neat example in support of that claim. The 
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concept of dimensional spaces beyond the three dimensions of everyday life occurs 
in many areas of science and engineering. For example, when studying codes that 
can both detect and correct errors that occur during the transmission of digital 
information, electrical engineers find that four (and even higher) dimensional objects 
arise in a natural way, and so they speak of hypercubes and hyperspheres and make 
great use of the geometry of these other-worldly objects. Specifically, let me now 
show you how the probability integral can be used to easily calculate the property we 
call the volume Vq of a sphere of radius R in d-dimensional space (where d is any 
positive integer).® 

You already know (almost certainly since the time you entered high school) the 
answers for the first three values of d. In one-dimensional space (d = 1), the 
“volume” of a sphere is defined by the space occupied by the collection of all the 
points that are within distance R of the sphere’s center. That collection clearly 
defines a line segment that extends distance R in either direction from the center 
point. That is, a sphere in one-dimensional space is a line segment of length 2R and 
so V,; = 2R. For the d = 2 case, the sphere’s “volume” is the space occupied by all 
the points that are within distance R (in a plane) of the sphere’s center. That 
collection clearly defines the area of a circle with radius R, and so V2 = xR*. And 
for the d = 3 case, we have the well-known (see note | in Chap. 11) V3 = 4nR’, As 
suggested in that note, we expect the general result to be Vg = caR", where cq is a 
factor that is a function of d, alone (and so c, = 2, cp = a, and c3 = +m). Our problem, 
then, is the calculation of cy for all positive integer d. 

We start with the centerpiece of this book, which we will establish over-and-over 
again as we move from chapter to chapter: 


/ e-*dx= Vn. 


— co 


If we multiply this expression by itself d times, we have 


(| eax (/ eax.) eee (/ eax, ) = (Va )o =n? 
=| / vee f (cote See ve \dxidxs © + edxg 


and so 


8Science fiction is the field of study that usually comes to mind when the fourth (and beyond) 
dimension is mentioned, but quite serious mathematical physicists have made use of the concept, 
too. The existence of a five-dimensional space (to unify the theories of gravity and electromagne- 
tism), for example, is assumed in the paper by J. G. Bennett, et al., “Unified Field Theory in a 
Curvature-Free Five-Dimensional Manifold,” Proceedings of the Royal Society of London, July 
1949, pp. 39-61. You can also find some discussion of the science fiction uses of high-dimension 
spaces in my book Time Machines, American Institute of Physics Press/Springer (2nd 
edition) 1999. 
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The integral in the box almost certainly looks pretty awful to you, but it has a 
beautiful physical interpretation which instantly clears away the fog. 

Imagine each of the d x; as a coordinate in a rectangular system in d-space. Then, 
if r is the length of the radius vector from the origin of that system to the point (x,, 
X,...,Xq) in d-space, we see that 


Also, the product of the d dx; differentials is the rectangular differential volume in 
d-space. That is, 


d 
[]_,4%1 =4v«. 


Thus, the d-fold integration in the box is simply the integration of e~" over the 
entirety of d-space (entirety, as each x; goes from —oo to oo). As stated, however, the 
boxed expression is still difficult to use as it involves a mixture of coordinate 
systems: r is a spherical coordinate, but the expression for dVq is the differential 
volume in rectangular coordinates. We need to write dV, as a function of r, alone, if 
we are to be able to evaluate the integral. Fortunately, it is easy to get that alternative 
expression for dV4q. 

Imagine a spherical bubble in d-space, with radius r. As argued before, the 
volume of this bubble is Vg = car, and so 


r 


dV, = 
ae = dcar* 


1 


or 
dVq = dcgr*~ !dr. 


With this, our d-fold integration in the box collapses to a single integration in the 
single variable r (which must vary from 0 to oo to encompass all of d-space). So, we 
have 


[o-@) 
2 = 
i e7* degr?~ dr =2"/?, 
0 


giving our result for cg as 
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xa/2 
Ca = Ga 
a/ e~?yrd— 1dr 
0 
In the integral, change variable to u = r° (and so au =2r=2u!/? or, dr= om 7 du). 


Thus, 


[o-@) [o-@) d-1 [o@) 4 
| efi tars | ae, saesu=5 | e- "uy?! du. 
0 0 2u 2 Jo 


When you get to Appendix D, you will see that its Eq. (1) shows that this last integral 
is r(9), where I’ is Euler’s gamma function that generalizes the factorial function 
from just positive integers to all the real numbers. So, the expression for cg becomes 


In Appendix D, you will learn that the gamma function has the properties 
(a) [+ 1) = nT (n) 

(b) Td) = 1 

© 1) =va 


From (a), we see that if n= g then 


4/2 
rH) 


and so, at last, we have our answer: Vg = cyR! where cg = 


For d = 1 we have, using (a) and (c), 


For d = 3 we have, using (a) and (b), 
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3/2 t/t r/R t/t tmx 4 


r+) 1@ G+) FRO) va 


C3 = TT. 


Thus, our formula for cg does, indeed, give the values we knew at the start for the 
first three values of d. For the first “new” result, the d = 4 case (giving the volume of 
a unit radius four-dimensional hypersphere), we have, using (a) and (b), 


_ im x x a x i) 


“=TG+1) TG) Tr@+D 2e) 2a+t 2r@)~ 2”° 


To check if you have got these calculations mastered, see if you can confirm that the 
volume of a unit radius hypersphere in the fifth dimension is cs = Xx’. 

All this from the probability integral. Amazing!” 

Okay, time to wrap-up. I opened this essay with a quote from an eminent 
mathematician, so let me close by telling you of two more mathematicians, both of 
whom are somewhat less eminent as they are cellmates on Death Row at the State 
Prison. Their particular crimes are not important in this tale, and in any case are 
simply too awful to be repeated (the rumor is the least of their criminal behavior 
occurred when they reversed the order of a double integral without justification, but 
that may simply be a vicious slander). The two have just finished lunch and are 
waiting for the arrival of the warden who, only a few minutes earlier, had received 
the Governor’s decisions on their last-minute appeals for mercy. 

When the warden arrives, the news is grim: both appeals have been denied and 
they are to be executed tomorrow evening at the stroke of midnight. The warden asks 
if either has any final requests, to which one of the doomed mathematicians quickly 
answers, “Yes, I do. I have just found a new derivation of the probability integral, 
and would like to give a lecture on it, to the entire prison, right after dinner tomorrow 
night, so my discovery won’t die with me.” 

“Sure,” says the warden, “that sounds great.” Then, turning to the other mathe- 
matician, he asks “And how about you? Any final wish?” 

“Yes,” comes the immediate reply, “a simple one. Please move my execution 
time up to dawn tomorrow morning, so I won’t have to sit through another after- 
dinner talk on the probability integral.” 

This story is one that might work as an after-dinner joke, for a speaker to offer to 
the audience as an apology for making everybody sit through another after-dinner 
talk. Mathematicians, in particular, should appreciate the recursion! The reason I’ve 
told you this is to draw a distinction between you and that second mathematician, 
because this entire book is, in fact, one after-dinner derivation of the probability 


°Here is a little “for fun” challenge problem for you. Comic book aficionados will remember that 
Donald Duck’s rich uncle, Scrooge McDuck, was so stupendously wealthy that his money occupied 
a volume of three cubic acres. Since an acre (43,560 square feet) cubed puts us in the sixth 
dimension, calculate the radius (in feet) of Scrooge McDuck’s money if stuffed into a 
hyperspherical bag. 
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Henes witene ou 
ModE YOU MISTAKE.” 


Fig. P3_ Every mathematician’s nightmare. (Drawing by arrangement with Sidney Harris, Science 
CartoonsPlus.com) 


integral after another, and I am betting that you would never decide to skip yet 
another one. 

While writing this book, I have approached the discussions not only from the 
point of view of a math historian, but also that of an (electrical) engineer who 
appreciates the delicate interlocking of the steps in an intricate mathematical proof. 
Each one of the derivations you’ll read here is a testament to just how fiendishly 
clever were the analysts of the past. The very fact that there are numerous derivations 
of it is a measure of the importance mathematicians and scientists attach to the 
probability integral. Only the Pythagorean theorem has more proofs (there is no such 
thing as too many proofs!), numbering in the hundreds!'° 

What makes us laugh when we look at Fig. P3 is that it illustrates the real worry in 
the mind of any mathematician who has just written-up a new proof. To put it 
bluntly, the fear that—somewhere—in all the clever analysis, one has made an 
incredibly subtle (that’s a gentle, alternative word for dumb) error. Lord, maybe 
even more than one! This book describes a lot of proofs—could there be errors 
lurking in any of them? I don’t think so, but I would be willing to bet that, when first 
written down, their creators thought about the horror(s) of others later discovering a 
misstep. 

Until now all the material in this book has been scattered across the printed 
literature (available only to those with either walk-in or electronic access to a college 
library) and/or tucked away in little side-nooks of the internet. Now you have it all 
under one cover, with a drop of historical glue here-and-there to hold everything 
together. 


10 See Eli Maor, The Pythagorean Theorem, Princeton 2007. 


Preface 


Ode to the Probability Integral 


There are numerous integrals not easy to do, 

Even if you hold your breath and turn your face blue, 
But the probability integral—most specifically, 

Is the one that best illustrates—the enormous difficulty, 
In doing a task 

While Nature wears a mask, 
And hides from view 

With nary a clue, 
On just what the devil 

You’re supposed to do! 


Read this book and you will know what to do. 


Durham, NH, USA 
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Chapter 1 ®) 
De Moivre and the Discovery cess 
of the Probability Integral 


For the most part, Pll not say much (if anything) biographical about the men whose 
names are associated with the mathematics that is the central core of this book. For 
this opening chapter, however, I'll make an exception because it was De Moivre who 
started it all. That’s pretty special, and so I'll treat him just a bit differently. 

Abraham De Moivre (1667—1754) was born into a Protestant family in a solidly 
Catholic France.' Potential religious persecution was muted, however, because in 
1598 the French King Henri IV had proclaimed the Edict of Nantes that granted 
many civil rights to Protestants. This allowed De Moivre’s father (a surgeon) to 
provide a comfortable, happy childhood for his son. That all changed in 1685 when 
Henri’s grandson, King Louis XIV, proclaimed the Edict of Fontainbleau that 
revoked the Edit of Nantes. Overnight, Protestant worship was forbidden, all 
children were ordered to be baptized by Catholic priests, Protestant schools were 
closed, and Protestant churches were destroyed. For a male Protestant to refuse to 
convert to Catholicism (or even worse, to attempt to flee France) could lead to 
imprisonment. 

Protestant life in France became sufficiently difficult that many decided they 
could no longer remain in France. One of those immigrants was De Moivre, and no 
later than early 1688, a youth no older than 20 or 21, he had crossed the English 
Channel and taken-up residence in London where he lived the rest of his life 
(eventually becoming a full citizen of England). France’s loss would be 
England’s gain. 

To support himself, De Moivre became a private tutor-for-hire in mathematics to 
the young sons of London’s wealthy, titled, upper-crust society. This required him to 
walk from his lodgings to the homes of his students, and so he spent a lot of time and 


' The biographical details concerning De Moivre that I present here are based on the paper by David 
R. Bellhouse and Christian Genest, “Maty’s Biography of Abraham De Moivre: Translated, 
Annotated and Augmented,” Statistical Science, February 2007, pp. 109-136. See also the older 
but still quite interesting paper by Helen M. Walker, “Abraham De Moivre,” Scripta Mathematica, 
November 1934, pp. 316-333. 
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energy not doing math, but rather simply walking around London (but surely 
thinking about math as he walked). On the positive side, his teaching activities 
brought De Moivre in contact with families that had ‘connections’ at the highest 
levels, and he eventually became acquainted with some of the elite of English 
science. In particular, by 1692 he had struck-up a friendship with Edmond Halley 
(1656-1742) of Halley’s Comet fame, and Halley’s friend, the great Isaac Newton 
(1643-1727). De Moivre’s mathematical talent was eventually formally recognized, 
and he was elected to membership in the Royal Society in 1697, just 10 years 
after fleeing France. Much later came membership in the Berlin Academy and, to 
his enormous pleasure, to the Paris Academy (although only a few months before 
his death). 

Above all else, De Moivre yearned for a real teaching post. In a letter to a friend, 
dated December 2, 1707, he wrote “If there were some position where I could live 
tranquilly ... I should accept it with all my heart.” Yet, despite the many collegial 
relationships he developed over the years, De Moivre was never able to obtain an 
academic appointment. His would always be a life of tutoring, and solving particular 
mathematical problems for paying clients (especially those who presented him with 
questions in probability; that is, De Moivre was the ‘go-to-guy’ for questions posed 
by London gamblers—see the example at the end of Appendix A). Rather than living 
a scholarly life in academia, his creative work is associated, not with Cambridge or 
Oxford, but rather with various London coffee-houses where he held court and 
offered mathematical consulting services. 

De Moivre’s name is, today, associated with two great ideas: De Moivre’s 
theorem and the central limit theorem of probability. The first, in modern notation, 
follows directly from the identity” 


e* = cos(x) +i sin(x), i=/V—1. 
Thus, 
(e*)" = { cos(x) + 7 sin(x)}" =e™* = cos(nx) +i sin(nx), 
and so 


cos(nx) = Real Part of { cos(x) +i sin(x)}" 


sin(nx) = Imaginary Part of { cos(x) + i sin(x)}". 


°Today we call this Euler’s identity, after the Swiss-born mathematician Leonhard Euler 
(1707-1783), but it was known (in a different form) to De Moivre long before 1748 (when it 
appeared in Euler’s writings). 
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For n = 2, for example, by equating real and imaginary parts we can easily find the 
double-angle formulas derived in high school trigonometry, formulas that are usu- 
ally derived via more complicated geometrical arguments. For n = 3 you can just as 
easily find the triple-angle formulas that (as far as I know) are beyond any geomet- 
rical arguments that appear in a high school trigonometry book. 

The second idea, the central limit theorem, is directly related to the probability 
integral which forms the basis for this entire book. 

De Moivre’s interest in the mathematics of probability ‘probably’ was the result 
of the interest, among many of his tutoring clients, in how to determine the odds of 
various games of chance (games usually involving the tossing of dice or the playing 
at cards). One of the mathematical tools in doing such calculations is the binomial 
theorem (see Appendix A), a theorem of great interest to De Moivre’s friend, Isaac 
Newton. Newton, however, seems not to have been much interested in that particular 
application of the theorem,’ but De Moivre took to it with great energy and 
enthusiasm (and success). An often-told anecdote is that when asked certain ques- 
tions in mathematics, Newton would reply “Ask Mr. De Moivre, he knows more 
about it than I do.” I suspect those questions were almost surely about probability. In 
his later life De Moivre’s interest in probability did turn to the more ‘morally 
acceptable’ subject of analyzing the mathematics of life insurance and annuities, 
but gambling is the historical origin of probability theory, and it’s where De Moivre 
started, as well. 

As discussed in Appendix A, if we flip a fair coin n times the probability of 


getting exactly k heads is @) = oe and so in n flips the total probability for the 


event that the number of flips on which heads occurs is between a and b (a < b) is 
given by 


1 b n 
c= #ae(; 


This is mathematically correct, but if n is a large number of flips it can be a 
computationally demanding exercise. For example, suppose n = 1000, a = 480, 
and b = 520. Then, 


With one notable exception involving a wager on a game involving the tossing of multiple dice 
(along with some minor publications on insurance annuities), Newton avoided applying his genius 
to questions of probability. Pll not get into the ‘notable exception’ here (being irrelevant to the 
subject of the probability integral) but, if you’re interested, you can find a discussion of “Newton’s 
Probability Problem” (along with some speculations on why Newton avoided probability) in my 
book Will You Be Alive 10 Years from Now? Princeton 2014, pp. 13-16. Much of what I say there is 
based on the paper by Stephen M. Stigler, “Isaac Newton as a Probabilist,” Statistical Science, 
August 2006, pp. 400-403. 
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1 520 1000 
P= 71000 er te ) (1.1) 


and we are faced with the calculation of dozens of factorials of large integers, which 
results in even (much) larger numbers. 2'”°° is pretty big, too! 

One way to get around this complication is to use logarithms. As shown in 
Appendix C (which itself uses the developments of Appendix B), what is called 
Stirling ’s formula gives us the following approximation for the factorial m!: 


m! ~ /2am™*!/2e-™ 


and so 


In(m!) ~ Z In(2x) + (m + 5) In(m) — m. 


Even if m is ‘large,’ with logarithms we will be dealing with much smaller numbers. 
Figure 1.1 shows our plan of attack for computing P from (1.1). 

There is strong evidence that this was the general approach first used by De 
Moivre in calculating probabilities like the one in (1.1). When he published his 1730 
Miscellanea Analytica, many copies of it included a Supplementum that ended with a 
table of 14 figure logarithms for factorials from 10! to 900! in steps of 10.4 It’s duck 
soup to program the logic of Fig. 1.1 on a modern computer—for (1.1), the result is 
P = 0.8054 for the probability a fair coin will, in a thousand flips, show between 
480 and 520 heads—but to do this by hand in the early 1700s must have been simply 
brain-muddling. And so, understandably, De Moivre searched for a better way to do 
the arithmetic in sums like (1.1). 

De Moivre’s thinking was almost certainly inspired by looking at plots like the 
one in Fig. 1.2, which shows the binomial terms for a fair coin flipped 100 times (that 
is, the plot shows the probability that exactly k heads show on 100 flips, where 
0 <k< 100). We see a symmetric, bell-shaped curve with its maximum value at the 
midpoint of the curve. The curve rapidly falls off on either side of the midpoint. 


*Karl Pearson, “Historical Note on the Origin of the Normal Curve of Errors,” Biometrika, 
December 1924, pp. 402-404. Pearson (1857-1936) was a mathematician at University College 
in London. 


> As a quick check on Fig. 1.2, which shows the probability of the middle term (exactly 50 heads) is 


~0.08 for n = 100 flips of a fair coin, you should confirm (use Stirling’s formula) that a (22) = Te 


and so, as m= sn = 50, the middle term probability is Te = 0.07979, which is really in pretty 


good agreement with the plot. 
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Fig. 1.1 Calculating the Start 
probability P from (1.1) 


For each k in (1), use Stirlings’s formula 
to compute the logarithm of all the 
factorials 


Use the factorial logarithms to compute 
the logarithm of each term in the 
summation of (1) 


Convert each term from its logarithm to 
the actual value of the term 


Add the actual value of the terms 
Compute the logarithm of the sum 
just computed 


Subtract 1000log(2) from the logarithm 
just computed (to perform the division 
by 21000) 


Convert the logarithm just computed 
to the actual value of P 


Done 
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Fig. 1.2. The binomial distribution is bell-shaped (n = 100) 


Suppose we denote the value of the middle term in the binomial expansion of 
(1 + 1)” = 2” by M, and the value of the term at distance d from M by Q. If we set 
n = 2m, then 


The bell-shaped curve is symmetrical about M, and so our result for a will be the 
same on either side of M, hence the plus/minus sign. In other words, notice that 


Go a) = ee a) = Hina 


Continuing, 
(2m)! 
Q_ mrdim (ml) 
M (2m)! (m+ d)!(m—d)!" 
mim! 


Using Stirling’s formula (as did De Moivre), this continues as 
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Q m2™+1e—2m mm 
M~ (m+ Geta [(m— ay" He-meal [(m + ame] [oma 4) 
= m™m 
(m+ d)"(m + d)*(m + d)?(m —d)"(m— d) ~“(m —d)? 
mm mn 


(m+ d)™(m—d)"™(m + 4)(m—d)~4(m+d)(m—d)? (m2? — 2)" (B42) "Vin? =? 


m d 
Q. nn’ m—d\ .d 
Mo ae ice a ete Se 


Taking logarithms on both sides, 


or, again because a <i, 


2 
n($) —m In c = (=) | +d in| (1 = *) (1 = “)| 
M m m m 
2 
—m at = (5) | +d in| —2 ‘|. 
m m 
Remembering that In(1 — x) = — x if x is ‘small,’ we have 


Q)\ _ d\? dj) @&@ .@ @ 
In) * n| (5) +a 25 | = 2—= 


m m m 


R 
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and so 


(1.2) 


and we start to see, with the squared appearance of a in the exponential roll-off of 


the probability curve, the emergence of the very beginning of the probability 
integrand. 

By 1730 De Moivre had taken the final steps from (1.2) to this wonderful result: 
for a fair coin (p= 5), 


k n—k n 
r=sins.() QQ atin k2.(°) 
n—oco =a n—co =a k 
b—4n-1) (1.3) 
lA 2 
! wa dx. 


= 1 
J2n a—s(n+1) 
ha 


P is the probability—as the number of flips n of a fair coin becomes large—that the 
coin shows a number of heads somewhere in the interval a to b, a < b. Decades later 
(1812) the French mathematician Pierre-Simon Laplace (1749-1827) extended De 
Moivre’s result to the general case of a biased coin, with his result reducing to (1.3) 
for p= is If you plug n = 1,000, a = 480, and b = 520 into (1.3) you get (I used the 
Wolfram on-line integrator) 


Fiat ent 2 _ 2.01835 
xq “~~ = 0.8052, 
7 Zn Tine Von 


which compares quite well with our earlier evaluation of the sum in (1.1) using 
logarithms. 

And now we can see how the probability integral emerges from all this. If we let 
a— 0 and b— + oo (that is, if we ask for the probability a fair coin shows a number 
of heads that is in the interval zero to plus infinity, a situation that obviously includes 
everything that could possibly happen and so has probability 1) we have 


©De Moivre’s result in (1.3), and Laplace’s generalizations of it, are today a special form of what is 
called the central limit theorem of probability. The word central is used in the sense of importance. 
This is a modern label dating from 1920 when it was introduced by the Hungarian-born American 
mathematician George Polya (1887-1985). 
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-Pdx=1 


1 l 
—— e 
Vv 2n —0o 
or, at last, we arrive at what the rest of this book is all about: 
/ eo dx = V2n. (1.4) 


Bell-shaped curves like Fig. 1.2 are, today, often called Gaussian—after the great 
German mathematician Carl Friedrich Gauss (1777—1855)—who used related prob- 
abilistic arguments in his studies of observational errors in astronomical work.’ 
Another name for bell-shaped curves is that they are normal, because they seem to 
occur in an endless number of probabilistic situations (such as the distribution of IQ 
and examination test scores,® the daily milk production of a herd of cows and the 
weekly egg production of a hen house, the amplitude of random electronic noise in a 
received radar echo, and ...). The common appearance of bell-shaped curves was 
amusingly commented on by the French mathematician Henri Poincaré (1854-1912) 
in his 1896 book Calcul des Probabilitiés, where he wrote “Everyone believes in 
[the generality of bell-shaped curves]: experimentalists believing it is a mathematical 
theorem, mathematicians believing it is an empirical fact.” 

The importance of De Moivre’s development of the analytical nature of the 
binomial bell-shaped curve is simply impossible to exaggerate. What he had tumbled 
to is what mathematicians today call a probability density function (or pdf). A pdf is 
associated with a random quantity (let’s call it X), with the pdf of X written as fx(x). 
Every time we measure X we get a different value (and so X is also called a random 
variable). We can calculate the probability of getting that value if we know fx(x). 
The pdf is said to be a density function because [Pf (x)dx is the probability X has a 
value in the interval a < x < b. 


"It was this part of Gauss’ work that prompted Glaisher (see his epigraph at the opening of this 
book) to suggest the adoption of what he called the error function (defined as erf(x) = zi fi e~" du 


) as a new function, to join the traditional functions of mathematics. Since his proposal of 1871, that 
has, in fact, happened (just look in any good collection of math tables). 


’The typical bell-shaped curve of test scores is the origin of the phrase ‘the prof grades on the 
curve.’ 


°Some historians believe this pithy observation is actually due to the French physicist Gabriel 
Lippman (1845-1921), and that Poincaré was simply repeating Lippman. Who ever said it, you 
can’t deny the truth of the words. A quite interesting essay on the subtle details behind the 
ubiquitous nature of bell-shaped curves is by Aidan Lyon, “Why Are Normal Distributions 
Normal?” British Journal for the Philosophy of Science, September 2014, pp. 621-649. 
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All pdfs have the following two defining properties: 
PI: fx(x) > 0 for all x 


and 
P2: [o ix(x)dx= 1. 


P1 avoids the possibility of X having a negative probability (whatever that might 
mean) for having a value in some interval, while P2 says the probability is 1 that 
X has some value once it has been measured. 

The particular pdf De Moivre had found in his studies of the binomial distribution 
is generally written today as 


e 2 ,-O<x<o, (1.5) 


which is called a Gaussian pdf.'° It is bell-shaped, with its maximum value occur- 
ring at x = m (called the mean of X). The amount of spreading around the maximum 
(which gives a bell-shape to the pdf—as shown in Fig. 1.2—is determined by the 
value of the positive constant o called the standard deviation of X). (To get a 
sequence of Gaussian random numbers with a mean of M and a standard deviation 
of S from a sequence of Gaussian random numbers with a mean of zero and a 
standard deviation of one—see the box at the end of this chapter for two ways to get 
such a sequence—simply multiply the second sequence by S and then add M to each 
term in the sequence.) 

As mentioned earlier, Gaussian random variables seem to occur in numerous 
physical situations (see the Epilogue of this chapter for a specific example from the 
age of nuclear weapons), but it isn’t the only important pdf. Equally important is the 
uniform pdf. If we denote a uniformly distributed random variable by Y, where 
Y takes its values from the interval a < y < b, then the pdf of Y is 


in that interval and zero outside that interval. This pdf clearly satisfies P1 and P2. The 
uniform pdf applies to random variables such that, over their entire interval of 
possible values, there is no reason to assume any particular value is more or less 
likely to be measured than any other value. 


‘OT am speaking casually here, as the Gaussian pdf is the limiting, continuous case of the discrete 
binomial distribution. De Moivre studied the number of heads in multiple flips of a fair coin, a 
number that must physically be an integer, and not just any real number. In a probability text (which 
this book makes no claim to be) such details are carefully handled, but here I think such details 
would be more of a distraction than anything else. 
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probability density 


2.5 3 3.5 4 


Fig. 1.3. The ‘bell-shaped’ probability density functions for the sum of n independent random 
variables each uniformly distributed from 0 to 1 


So important is the uniform pdf that every modern scientific programming 
language has a built-in random number generator that, each time it is invoked, 
generates a random value from a distribution uniformly distributed from 0 to 
1. From this specific generator, one can then generate the values of any other random 
variable with a specific pdf.'! One form of the central limit theorem even more 
general than is (1.3)—-see note 6—says that if one takes more and more independent 
random variables that are identically distributed (have the same pdf) and adds them, 
the result converges to a random variable with a Gaussian pdf. This is not a trivial 
result to prove, but we can see it in action with the aid of a computer. 

Suppose we add together n independent random variables, each uniform from 0 to 
1. The result will, of course, be a random variable with a value from 0 to n. If we do 
this a very large number of times, we can construct an estimate for the pdf of the sum. 
In Fig. 1.3 this has been done for three values of n. As you can see, for n = 2 the 


"How the uniform 0 to 1 generator works (and how to use it to generate the values of random 
variables with other specific pdfs) involves mathematics that P’ll not attempt to get into here. 
(If curious, however, you can find elementary discussions in my two books Duelling Idiots and 
Other Probability Puzzlers (2012) and Digital Dice (2013).) The availability of software-based 
random number generators allows the writing of what are called Monte Carlo simulations of 
random processes (the name—due to the physicist Nicholas Metropolis (1915—1999)—is a refer- 
ence to the famous casino on the French Riviera), with which one can use a computer to study 
problems that have no (known) analytical solutions. Metropolis was the leader of the team that built 
the MANIAC electronic computer at the Los Alamos National Laboratory after the Second 
World War. 
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result is a triangular pdf (which already is starting to ‘look bell-shaped’), while for 
n=3 andn = 4 the pdf curves are definitely bell-shaped.’ This doesn’t prove the 
central limit theorem, but it is certainly suggestive. 

Probability theory has come a very long way since De Moivre, but he started its 
modern analytical development. Now, admittedly, that is a bit of somewhat more 
loose talk. Decades before De Moivre the Italian mathematical physicist Galileo 
Galilei (1564—1642) had already applied analytical thinking to a gambling problem 
involving the tossing of dice (for details, see Will You Be...., cited in note 3), but 
that was via an exhaustive listing of all possible ways three dice could land and so 
was a problem in discrete probability, a problem solved by ‘mere’ counting. To 
handle problems in continuous probability, involving pdfs like the Gaussian, or the 
uniform, requires mathematics (far) beyond what was known to Galileo (but see the 
Epilogue to this chapter for just a bit more on this point). 

De Moivre knew (1.4) when Euler had only just recently created the gamma 
function (see Appendix D), and decades before either Laplace or Gauss were born— 
but be careful to note that his knowledge came indirectly via a physical argument: 
the probability of an event certain to occur is 1. A direct, purely mathematical 
evaluation of the probability integral, using no special physical interpretation based 
in probability or anything else was, in 1733, still an open problem. Such a derivation 
apparently waited until 1774, when Laplace—traditionally (but almost certainly 
incorrectly) said to be the first to calculate the probability integral—finally did 
it. What Laplace did is the topic of Chap. 2. 


Epilogue 

My earlier claim that problems in continuous probability, even one involving the 
simplest possible pdf (the uniform), would be beyond what Galileo could have 
handled, might not be quite correct. As I thought about that comment, long after I 
first wrote it, it did occur to me that I had often discussed the following problem 
when I taught elementary probability theory to undergraduate students in electrical 
engineering at the University of New Hampshire. It seems to me, now, as I 
reconsider matters, that a genius like Galileo might well have been able to solve 
this problem (although in the following discussion I will be using some terminology 
that would become commonplace among probabilists only centuries after his death). 


So, suppose Ed and Gail agree to meet at their favorite coffee shop later that 
afternoon. They are both pretty casual about details, and agree only that each will 
arrive at the shop ‘sometime’ between 5 pm and 6 pm. That is, if E and G are the 


Porn = 2, a total of 400,000 random sums were computed, for n = 3, a total of 600,000 random 
sums were computed, and for n = 4, a total of 800,000 random sums were computed. These 
numbers keep the ratio of ‘number of sums to the interval width 0 to n’ the same for all three pdf 
plots. Thus, a total of 5.8 million random numbers were used to generate Fig. 1.3. In the early days 
of high-speed computers, this direct use of the central limit theorem was key in supporting Monte 
Carlo simulations modeling physical processes obeying Gaussian pdfs. Adding as many as 8 to 
12 values from a uniform distribution to get one value from a bell-shaped pdf was not unusual. For 
more elegant ways to do that, see the box at the end of this chapter. 
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arrival times of Ed and Gail, respectively, then we’ll take E and G to be independent, 
uniformly distributed random variables, with each taking its value from the interval 
0 to 60 min (time is measured from 5 pm). They further agree that if Ed arrives first, 
he will wait 15 min for Gail (and then leave), while if Gail arrives first, she will wait 
just 10 min for Ed (and then leave). Both will certainly leave at 6 pm, even if their 
waiting time isn’t yet up. What is the probability Gail and Ed successfully meet? A 
Monte Carlo computer simulation is easy to write (something Galileo couldn’t have 
imagined doing, but see the following box), and executing the code several times 
gives estimates for the probability ranging from 0.3714 to 0.3716. 

The Monte Carlo code, meet.m, simulates ten million attempts by Ed and Gail to 
meet. The code is written in MATLAB, but is so elementary it is easily translated 
into any of the other common scientific programming languages. It is so elementary 
(using just one for-loop and three if conditionals), in fact, that I think it is self- 
explanatory. On a quite ordinary laptop the code ran through all ten million attempts 
in just slightly more than half-a-second. 


Smeet.m 
m=O; 
for loop=1:10000000 
e=60*rand;g=60*rand; 
if e<g 
if g-e<15 
piel ¢ 
end 
else 
if e-g<10 
m=m+1; 
end 
end 
end 
m/10000000 


Here’s how to calculate the exact value for the probability of a meeting. We start 
by considering the following two, inclusive cases, based on the observation that the 
probability Ed and Gail arrive simultaneously has probability zero (which doesn’t 
mean such an event is impossible, just highly unlikely). Thus, one of the two will 
‘almost always’ be the first to arrive. Now, 


(a) if Ed arrives first, he has the smaller arrival time (E < G) and so there is a 
meeting only if G — E < 15, and 

(b) if Gail arrives first, she has the smaller arrival time (G < E) and so there is a 
meeting only if E — G < 10. 


The geometry of each of these two cases is shown in Figs. 1.4 and 1.5, respectively, 
using what probabilists call the sample spaces of (a) and (b). 
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Fig. 1.5 Geometry in sample space of a meeting if Gail arrives first 
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If Ed arrives first, the sample space is the triangular region in Fig. 1.4 below the 
diagonal E=G. Each point in that region represents a possible pair of values for 
E and G, but only the points in the shaded region (those above the line E = G — 15) 
are the points in sample space that represent a meeting. So, assuming that Ed arrives 
first, the conditional probability of a meeting (call it P,) is the probability associated 
with the shaded strip in Fig. 1.4. Because of the uniformity of both E and G, we 
interpret the probability of an area as the ratio of that area to the area of the entire 
sample space (mathematicians call this a problem in geometric probability). An easy 
way to calculate that probability is to subtract the probability of the unshaded area in 
the sample space from | (the probability of the entire sample space). Thus, 


P,=1 


4(45)(45) 81 
5(60)(60) 144” 


We construct Fig. 1.5 (Gail arrives first) in the same way, now using the inequalities 
G < E (or E > G) and E — G < 10. The sample space points that satisfy both of 
these inequalities are the points above the diagonal line E = G and below the line 
E = G + 10. The collection of these sample space points is the shaded strip in 
Fig. 1.5. So, assuming that Gail arrives first, the conditional probability of a meeting 
(call it Pz) is the probability of that strip, or, 


5 (50) (50) 100 
P,=1 1 


(60)(60) 144" 


At this point we appear to have an embarrassment of riches. We asked for the 
probability of a meeting, but what we have are two conditional probabilities (that is, 
probabilities conditioned on who arrives first). How do we get a single probability, a 
value without any conditioning assumptions? Here’s how to do that. Since Ed and 
Gail have the same probability of being the one who arrives first then, if we repeat 
this process a total of N times, we expect that Ed arrives first 5N times and that Gail 
arrives first 5N times. The total number of meetings is then given by 
5NP; + 3NP2 = 3N(P; + P2). This number of meetings occurs in N attempts, and 
so the probability of a meeting is 


5 N(Pi + P2) l (p, 4. Py) aC 81) (,_ 100\)_ 17, 181)_,_ 181 
N oo te Al iaa) a( iaa)| 3 iad 288 
107 
= dgq = 0.37152 


in excellent agreement with the Monte Carlo simulation code meet.m. Galileo (and 
De Moivre, too) would have been intrigued (to say the least) at this way of answering 
questions in probability. 
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Another example of geometric probability, of great historical interest, is the 
Buffon Needle Problem. The name comes from its originator, the French mathema- 
tician Georges-Louis Leclerc, Comte de Buffon (1707-1788), who posed it in 1733 
and solved it in 1777. Imagine you are standing in the middle of a very large, flat 
surface (think of a basketball court) that has been ruled. That is, the surface is 
covered by an array of parallel, straight, uniformly spaced lines. In your hand you 
hold a needle of length equal to the spacing between the lines. You then toss the 
needle into the air (giving it any amount of spin and tumbling you wish—the more, 
the better) and watch it randomly fall onto the surface and eventually come to rest. 
What is the probability the needle lies across a line (as opposed to lying totally within 
the space between two lines)? 

Buffon never actually calculated a probability for this question, but rather the 
odds that a coin tossed onto a tiled floor would land inside a tile, or across adjacent 
tiles. This was a popular game called franc-carreaux (‘fair square’), one surely 
known to De Moivre. Later Buffon recast the question into one of a needle tossed 
onto a ruled surface, and suggested that one could just as well use a checkerboard 
and a sewing needle. In any case, the number z never entered into any of Buffon’s 
calculations. 

Then, in 1812, Laplace calculated the probability to be 2 and observed that the 
Buffon Needle Problem (as it was then known) could be turned on its head to provide 
an experimental method to determine the value of pi (physicists doing math!). That 
is, if we toss the needle n times and observe a total of c crossings of the needle with a 
ruled line, then (for large n) 


c 2 
n 
or, 
2n 
ry —. 
Cc 


I won’t, here, take you through a theoretical derivation of this, but rather (in the spirit 
of the last lines of note 11) see if you can write a Monte Carlo simulation of the 
needle tossing process (for, say, 100 million tosses), and in this way ‘check’ the 
theoretical result. The concept of Monte Carlo simulation, so routinely used today in 
science and engineering (in math, too'*) can be said, therefore, to have started with 
Laplace (and Buffon, too, as he had earlier used a similar experimental approach on a 


'3 For an example of mathematicians using Monte Carlo to check a theoretical result, see the end of 
the paper by Herbert Bailey and Duane Detemple, “Squares Inscribed in Angles and Triangles,” 
Mathematics Magazine, October 1998, pp. 278-284. 
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different probability problem).'* A number of people followed-up on Laplace’s 
suggestion but, in the days before electronic computers, physical fatigue usually 
set-in after just a few thousand needle tossings.'° To suggest doing 100 million 
tossings would have gotten you declared to be delusional. You can compare your 
work with mine at the end of the book, where you’ll find (Appendix H) both a 
theoretical solution (much like the solution to the Ed-and-Gail problem) and a Monte 
Carlo simulation. 

What made the Ed-and-Gail problem ‘easy’ to solve (and the needle problem, 
too) is the uniformity of the random variables involved. When the pertinent random 
variables are not uniform things move up in complexity, and we get into situations 
where Galileo would have been almost surely stumped. Here’s an easy to understand 
example of that from a modern-day military problem. When an intercontinental 
ballistic missile (ICBM) with a thermonuclear warhead is launched at a target 
thousands of miles distant, it will miss its intended aim point by a random distance. 
The standard mathematical description of this is as follows: if an arbitrary set of x, 
y-axes is drawn with the target at the origin, the x and y miss distances are taken to be 
independent Gaussian random variables, each with zero mean (m = 0 in Eq. 1.5) and 
equal standard deviations (same o in Eq. 1.5). 

The radial miss distance—which is all that really counts as far as the blast effects 
of a nuclear explosion are concerned—can be shown to be described by the random 


variable R = \/ X? + Y”, where X and Y are Gaussian random variables for the miss 
distances along the x and y axes, respectively. The pdf of R is 


frit) = Ge" ,1>0 (1.6) 


'4Tt is a common belief among many that Monte Carlo dates from the late-1940s work of the 
mathematicians Stanislaw Ulam (1909-1984) and John von Neumann (1903-1957). That version 
of the origin of Monte Carlo can be found in two papers published in the 1987 Special Issue of Los 
Alamos Science: N. Metropolis, “The Beginning of the Monte Carlo Method,” pp. 125-130, and 
Roger Eckhardt, “Stan Ulam, John von Neumann, and the Monte Carlo Method,” pp. 131-143. In 
fact, however, Buffon had used a Monte Carlo approach (called statistical sampling) nearly 
200 years earlier to study the famous probability puzzle called “The St. Petersburg Paradox” (see 
any good introductory probability text for a discussion of the ‘paradox’ and/or Stephen M. Stigler, 
“Stochastic Simulation in the Nineteenth Century,” Statistical Science, February 1991, pp. 89-97). 
What Ulam, von Neumann, and Metropolis more correctly pioneered was the application of high- 
speed electronic computers to Monte Carlo calculations involving atomic physics. See, for exam- 
ple, Nicholas Metropolis, et al., “Equation of State Calculations by Fast Computing Machines,” The 
Journal of Chemical Physics, June 1953, pp. 1087-1092. See also Herbert L. Anderson, “Metrop- 
olis, Monte Carlo, and the MANIAC,” Los Alamos Science, Fall 1986, pp. 96-107. 


'S One famous example of that sort of thing, which made the astonishing claim of getting the first six 
decimal digits of pi correct, was done in 1902 by the Italian secondary schoolteacher Mario 
Lazzarini, using just 3408 tosses. An interesting analysis of the likelihood of that claim is in Lee 
Badger, “Lazzarini’s Lucky Approximation of 2,” Mathematics Magazine, April 1994, pp. 83-91. 
For quite some time mathematicians thought Lazzarini to be a fraud, but a more recent view is that 
he was simply a jokester (see Hans van Maanen, “Lazzarini’s Little Sticks,” Skepter (Dutch), 
Autumn 2018, pp. 8-12). 
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where R is called a Rayleigh random variable,'° after the English mathematical 
physicist John William Strutt (1842-1919), who discovered it in 1887 in connection 
with studies in the physics of sound waves. Strutt is better known to modern 
physicists and mathematicians as Lord Rayleigh; unlike Lord Kelvin, who was 
elevated to the peerage for his scientific work, Rayleigh had an inherited lordship. 
(The two were friends—see Fig. 1.6—and Rayleigh was every bit Kelvin’s scientific 
equal; he received the 1904 Nobel prize in physics.) The Rayleigh pdf is ubiquitous 
in mathematics and physics, and you’ll see it again in Chap. 7 when we talk about 
electronic noise in a radar receiver. Galileo in the seventeenth century, De Moivre 
and Buffon in the eighteenth century, and Laplace in the nineteenth century, could 
not have even begun to imagine how their ‘mathematics of gambling’ would become 
the ‘mathematics of atomic war’ in the twentieth and twenty-first centuries. 

Upon De Moivre’s death, his friends arranged for his burial in the London 
cemetery at St. Martin-in-the-Fields. His passing marks the end of Chap. |, but it 
is just the start of our story on the evaluation of the probability integral. 


Generating Values for Gaussian Random Variables 

When Eq. (1.5) describing a Gaussian random variable was introduced, I left 
unanswered how to generate a sequence of random numbers with a Gaussian 
probability density. With the assumption that we have access to a generator of 
random numbers that are from a uniform process (over the interval 0 to 1), 
there are two popular algorithms that transform those numbers into Gaussian 
numbers. They are the Box-Muller, and the Marsaglia algorithms, named after 
the American statisticians George Box (1919-2013) and Mervin Muller 
(1928-2018), and George Marsaglia (1924-2011). Both algorithms generate 
a sequence of random numbers from a Gaussian process with zero mean and 
unit standard deviation. As mentioned in the text, such a sequence is then 
easily transformed into a sequence with any mean and standard deviation 


(continued) 


'6Tt would require more development of probability theory than I want to do in this book to show 
you how to conclude that the square root of the sum of the squares of two zero-mean, independent 
Gaussian random variables with identical standard deviations results in a Rayleigh random variable, 
but you can find derivations in just about any modern introductory engineering probability 
textbook. The missile miss distance problem has the nice feature of providing a physical interpre- 
tation to the o in the Rayleigh pdf of (1.6). If we imagine a circle of radius d centered on the target, 
military analysts define d to be what is called the CEP of the missile (denoting the somewhat 


awkwardly named circular error probable), if d is the radius of a circle such that the probability the 


missile lands inside that circle is 5. That is, fe Set/ 20° dy = b This is easily solved to give 


d=o,/In(4) =1.1770; o is (very approximately) the CEP. Typical values for the CEP of 
ICBMs range from tens of meters to hundreds of meters. For a one-megaton warhead intended to 
destroy a city (a ‘soft’ target), it hardly matters what the CEP is but, to preemptively attack an 
adversary’s ICBMs before they are launched from hardened silos, missile CEP very much does 
matter. 
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Fig. 1.6 Lord Kelvin 

(on the right) gets a tour in 
July 1900 of Lord 
Rayleigh’s home laboratory 
at Terling Place, Essex, 
England. (Image courtesy of 
the AIP Emilio Segré Visual 
Archives, Physics Today 
Collection) 


desired. For the Box-Muller algorithm, get two independent values from a 
uniform generator (call them X, and X,), and then Y,= 

sf —2In(X;) cos(2mX2) and Y2 = \/ — 2 In(X;) sin(2xX2) are two indepen- 
dent values from a Gaussian process sequence with zero mean and unit 
standard deviation. For the Marsaglia algorithm, get a pair of numbers X, 
and X, from the uniform generator (over the intervals —1 < X, < 1 and — 


1 < X < 1) such that S=X? + X3 <1. Then Y) =X14/—"®™ and Y2 = 


Xo a are two independent values in a sequence from a Gaussian 
process with zero mean and unit standard deviation. Figures 1.7 and 1.8 
show the result of implementing the Box-Muller algorithm, and the Marsaglia 
algorithm, respectively, with each plot created from a sequence of length one 
million numbers. The two figures are histograms of the one million numbers 
sorted into 1000 bins. Both algorithms are generally acceptable but, while both 
calculate logarithms, the Marsaglia algorithm avoids also calculating sines and 
cosines (which are computationally demanding). Thus, the Marsaglia algo- 
rithm executes much faster than does Box-Muller and so is, in my opinion, the 
preferred choice. 
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number of numbers in each bin 


-5 0 5 
1,000 bins, one million numbers 


Fig. 1.7 Box-Muller algorithm 


number of numbers in each bin 


5 0 5 
1,000 bins, one million numbers 


Fig. 1.8 Marsaglia algorithm 


Part I 
Evaluating the Probability Integral 
(The Beginning) 


Chapter 2 ®) 
Laplace’s First Derivation se 


A direct evaluation of the probability integral was done in 1774 by Laplace. In that 
work’ he began by reducing the probability integral to another integral, and that was 
the integral he evaluated. In modern terms, here is what he did.” 

Starting with (eo dx, make the change of variable u= e- and so 


In(u) = — 4x? or, x=4/—2In(u) and so & = —xe-* = —,/—2In(u) u or, 
a du 
dx = Var Thus, 
ee ax= [iy 2 
0 1 u,/ — 2 In(u) 
and so 
[owas ff ae Ga) 
0 0 /—2In(u) 


It is this last integral on the right-hand-side of (2.1) that Laplace evaluated. 

The line of attack in this chapter consists of establishing several individual, 
seemingly unrelated preliminary results that will, almost miraculously, come 
together at the end to give us our result. We start by looking (just because we can) at 


"You can find a nice review of what Laplace was doing that year in Stephen M. Stigler, “Laplace’s 
1774 Memoir on Inverse Probability,” Statistical Science, August 1986, pp. 359-363. Whether 
Laplace was really the first to calculate the probability integral is explored in more depth in Chap. 3. 
The analysis I’m about to show you is an elaboration of a paper by the Canadian physicist 
Napoleon Gauthier, “Evaluating the Probability Integral,” The Mathematical Gazette, July 1988, 
pp. 124-125. Gauthier’s intent was to show that a beginning calculus student could understand how 
to handle Laplace’s integral. 
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alm, m(u-#—1) 


which leads to the indeterminant oo @ 0, which as it stands is not in the proper form 
for applying L’H6pital’s rule. We can get it into proper form by first writing x = 
which transforms the limit into 


ae 
m?’ 


a form that leads to the indeterminant ° which is just what we need for L’H6pital’s 
rule. Thus, since 


u~2x = gin(u-*) =e 2xIn(u) 


we have 


That is, 
Jim m(u# - 1) = —2In(u) 
and, with some simple algebra (that I'll leave for you) this becomes our first result: 


lim Sway (2.2) 


mL =P 0. yim(1 — us) / —2In(u) 


Next, consider the integral 


2 Laplace’s First Derivation 


and make the change of variable u = sin™(8). Then, 


du 2s - m—1 
qo —m sin (8) cos(8) 


and so 


m sin™~'(0) cos(@)d0 


rere aa 


n/2 
a sin eG s(0)d0. 
Jo cos? 


Simplifying, we have our second result: 


un 


af Tie 


—Ssu=m sin™(8)d0, m > 0. 
0 


25 


(2.3) 


For our final preliminary result, we define the family of integrals indexed on m as 


follows: 
n/2 
n= f sin™(0)d0,m=0, 1,2,3,,,,. 
JO 
Using the well—known formula for integrating-by-parts, 
b 


b “b 
[vav=tw) -| v du, 


where we write u = sin™ ~ '(8) and dv = sin (0)d0. Then 


du =(m— 1) sin™~ *(0) cos(®)d@ and v= — cos(0). 


NIA 


(2.4) 


n/2 
cos(8)(m— 1) sin™~?(0) cos 
+ [ c0s(0)(m— 1) sin™2(0) cos(0)a0 
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n/2 
L, =(m— nf cos 7(@) sin™~ *(0)d0 
0 


or, 
n/2 n/2 
Tin = (m — yf sin™~?(8)d0 — (m— if sin™(0)d0 
or, 
In =(m— 1)In—-2 — (m— 1)In 
or, 


In{l + (m— 1)] =(m— 1)In—2 
or, at last, we have the recursion 
min =(m— 1)In—2,m=2, 3,4, .... (2.5) 
We start m at 2 in (2.5) since that says 2I, = Io, where Ip is the first member of our 


family of integrals (there are no integrals I,, for m < 0). In particular, notice from 
(2.4) that 


and 


NIA 


n/2 
I, =| sin(8)d8 = — cos(0) 
0 


values we’ll use in just a moment. 
Multiplying through (2.5) by L, — 1, we have 


mmlm—1=(m—1)Im—Im—2- (2.6) 
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If we write (2.6) out for the first few values of m, we see a remarkable thing: 


m=2: bh =hh= 5, 


m=3 : 3k) =2bI) =hbb= - 


m=4 : 44h =3bb =2bh =hl = 5, 
and so on. For any m > | we see that 
T 
mMnln—1= 5° (2.7) 


Now, as m — ov, I, and I, — ; become indistinguishable and so we can write 


lim miylm—1 = lim m2, = 
m— oo m—oco 


Nia 


or, 


lim /mln = 7 (2.8) 


m—oco 


We are now ready to put all of these separate results together. We have, from (2.1) 
and (2.2), 


re [Sa [in gta 
fe —21n Pee (1 ur) 


or, assuming we can reverse the order of integration and the limit,* 


co 1 a 
i. e-* dx = lim I = du. 
0 moo \/m 0 (1 — ua) 


This assumption is not always legal. The Riemann integral is, itself, defined as a limiting process, 
and so to reverse the order of the limit and the integral is to reverse the order of two limiting 
operations. A dramatic example (from electric circuit theory) of how doing such a reversal can fail is 
discussed in my book Mrs. Perkins’s Electric Quilt, Princeton 2009, pp. 27-36. For a way to avoid 
this potential objection, see the commentary on Gauthier’s paper by the Scottish mathematician 
Darrell Desbrow in The Mathematical Gazette, June 1990, pp. 169-170. A variation on Gauthier’s 


derivation is in Nick Lord, “An Elementary Single-Variable Proof of [ oe dx= V2n,” The 
Mathematical Gazette, June 2003, pp. 308-311. 
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Using (2.3), we can replace the last integral to get 


m— oo m—oco 


oo . n/2 
i e-* dx= lim Sam | sin™(0)d0= lim /mln 
JO m Jo 


and so, recalling (2.8), we have 


oo a Tt 
i, e *dx= 3 
0 


and with this Laplace had derived what De Moivre had deduced forty years earlier. 
Stigler (see note 1) reports that in 1809 Gauss declared* Laplace’s derivation to be 
“elegant,” high praise indeed from one of the greatest of all mathematicians. 
Whether you consider this somewhat laborious derivation to be “elegant” is, how- 
ever, of course a matter of taste. As you’ll see in Chap. 4, Laplace himself may well 
have had some doubts about just how elegant his 1774 analysis actually is. 


‘Tn his book Theory of the Motion of Celestial Bodies Which Revolve About the Sun in Conic 
Sections. 


Chapter 3 
How Euler Could Have Done It Before rack 
Laplace (But Did He?) 


This chapter is a three-for-one, in that it will show you how Euler could have derived 
the probability integral three different ways. That is, well before 1774 Euler had all 
the technical tools needed to do a derivation—but apparently did no such thing.' 
Well, you might ask, why didn’t Euler do any of the math you are about to see, and so 
beat Laplace to the punch? Well, maybe (in my opinion, almost certainly) he did but, 
if so, it was in passing and just another ‘easy’ (for Euler!) calculation along the way 
to some other goal he had his eyes on and, without much thought about it, he roared 
off in pursuit of that goal. And, of course, one might also ask why didn’t Laplace do 
any of these three derivations instead of his rather lengthy 1774 analysis of the 
previous chapter? Well, who knows? As you’ll see in the next chapter, with 
Laplace’s second derivation, he more than makes up for his initial ‘miss’ (if miss 
it was—after all, recall Gauss’ opinion of Laplace’s derivation at the end of Chap. 2). 


3.1 Derivation #1 


My first imagined ‘Euler derivation’ is a fairly extensive elaboration of a short note” 
by the American mathematician Murray Spiegel (1923-1991), well-known to all 
engineering, physics, and math students (probably numbering in the millions) who 


— x2 


"As one writer expressed this, “whether Euler ever focused on the function e~* may be questioned 
... That is [De] Moivre’s feat.” See J. van Yzeren, “Moivre’s and Fresnel’s Integrals by Simple 
Integration,” The American Mathematical Monthly, October 1979, pp. 691-693. Be sure to see the 
final paragraph of this chapter for more on this point. 

°M. R. Spiegel, “Remarks Concerning the Probability Integral,” The American Mathematical 
Monthly, January 1956, pp. 35-37. Spiegel’s derivation was rediscovered half-a-century later by 
Paul Levrie and Walter Daems, “Evaluating the Probability Integral Using Wallis’s Product 
Formula,” The American Mathematical Monthly, June-July 2009, pp. 538-541. 
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have a copy of Schaum’s Math Tables (edited by Spiegel). Spiegel starts by recalling 
the definition of Euler’s famous number, e, the base of the natural logarithms: 


lim (1+) "=e". (3.1) 
n—oo n 
So, writing 


s= | edt, 
0 


Spiegel then goes on to say “it appears plausible to consider the [following family of] 
integrals” indexed on n, defined as 


because, from (3.1), 


We do this because, as Spiegel writes, “Heuristically it would seem” we can write 


lim s,=8= | edt. 
n—-oc 0 


If we let t= /n sin(x), where n = 1, 2, 3, ...—and so dt = \/ncos(x)dx—we have 


or 


si=vaf” eo gl x)dx,n = 1, 2,3, sax: (3.2) 


A recursion for the S,, can be found as follows, via integration-by-parts. 
Let u= cos7"(x) and dv = cos (x)dx, and so du= — 2n cos*" ~ '(x) sin (x)dx and 
v = sin (x). Then, 
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bl] 
2 


n/2 n/2 
ip cos7"*"(x)dx = cos7"(x) sin(x)| + 2n f cos 7"~!(x) sin?(x)dx 
0 0 


0 


n/2 
=2n / cos 7"~!(x) [1- cos *(x)]dx 
0 
n/2 n/2 
= an f cos™*—!(x)dx — 2n f cos 7"! (x)dx 
0 0 
and so 
n/2 n/2 
(1+ 2n) [ cos 21 (x)dx=2n | cos 7"~ !(x)dx 
0 0 


or 


7 cos *"*!(x)dx = ou “ cos *"~!(x)dx (3.3) 
5 ~ 142n Jy 


From (3.3) we see 


and so° 


om /2 5) on —2 om/2 
2n+1 _ n iH 2n-3 
| cos “"'' (x)dx = (= = i) & = t) f cos (x)dx. 


Notice that, for any n, the numerators of the product terms in front of the integral on 
the right are all even, and that the denominators are all odd. If we keep doing this, we 
eventually reach 


i cos" (xd = = 7) é - 7) (= = 3) ae (3) i ee 


and, since 


31f this isn’t immediately clear, then make the substitution n = k — 1| in (3.3), write (3.3) out in 
detail, and then change notation back from k to n. 
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z 
5 


=1 
0 


oe cos(x)dx = sin(x) 


we have, from (3.2), 


S.=valsrr) (G1) (oma) G) 


and so 


a ae (2)(4)(6), ,, (2n) 
fn eS = lev" ByG)a)...2n—nei): 


This gives us 


= ea (2)(4)(6),,, (2n) vn 
s= un, (3)(5)(7)... (2n— 1) 2n + | = “| Re) 


As n — oo the expression in the far-right square brackets of (3.4) becomes re And 


from equation (B.5) in Appendix B, we know that 


tim —__(2(4)(6).»5(2n)_ fe 
noo (3)(5)(7)...(2n—1)V2n+ 1 2 


Combining these two results in (3.4), we (perhaps Euler, too?) immediately have the 
probability integral: 


s= | e-*dt= Lie: 
0 2 


3.2 Derivation #2 


When Spiegel presented his derivation, he was replying to an earlier paper* that 
offered an alternative derivation of the probability integral, a derivation he thought 
just a bit too sophisticated for undergraduates. As Spiegel diplomatically put it, he 
thought his method “was perhaps a little more direct from the point of view of 


FACT, Coleman, “The Probability Integral,” The American Mathematical Monthly, December 1954, 
pp. 710-711. Albert John Coleman (1918-2010) was professor of mathematics at the University of 
Toronto when he wrote his paper. 


3.2 Derivation #2 33 


motivation.” I think Spiegel was correct in saying that, but even so the earlier paper 
is full of ingenious arguments that are just too good to pass-by. So, here (with a lot of 
elaboration) is how Professor Coleman did it, using only math that was all known to 
Euler well before Laplace in 1774. 

Coleman starts by defining a family of integrals indexed on m (with n some 
positive integer—to ensure the integrals converge—with a specific value we’ll pick 
after a few more steps): 


In =| xme dy, (3.5) 
0 

Integrating (3.5) by-parts (u=e ~9x"/2 and dv = x'"dx) we quickly arrive at 

m+1 oo co Lm+1 5 oo 5 
Re ee) * —nxe -/2gx = 9 _ | xmt2e —m'/2dx 
m+ 1 0 0 m+ 1 m+ 1 0 
n 
= apd 


Replacing every m with m — 2 (which leaves the truth of the statement unaltered) 
this becomes 


or 
In = I>. (3.6) 


Coleman next observed that, for any t, 
o 2 
| x™(x —t)?e~™ /dx > 0, 
0 


which is immediately seen to be true because each factor in the integrand is 
individually always non-negative. Why this is of interest, however, is probably not 
terribly obvious (and Coleman doesn’t offer an explanation, no doubt thinking it 
‘self-evident,’ and this is where I think Spiegel had his first thought about some 
motivation being called for). 

Here is what I think Coleman had in mind. If we expand the integrand, we have 


[e<) E [oe] 5 co 2 lo. ¢) 2 
i x™ (x? —2xt+ ft je" dx = | xmte ma ay 2 xmtle r+ ef x™e 27>0 
Jo 0 0 0 
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or, 
Tnt2 — 2tIng1 + In > 0. (3.7) 


Coleman then immediately writes “Hence, Im41 <~WImIm;2 .” an assertion I think 
(as no doubt did Spiegel) that would, as it stands, leave most high school seniors and 
even college freshmen in a stunned silence. Some elaboration (perhaps a Jof!) is 
definitely needed here, and here is one way to see where Coleman’s inequality 
comes from. 

The left-hand-side of (3.7) is quadratic in t, and a plot of it as a function of t will 
give a curve (a parabola) that is always above the t-axis (because of the >0 
condition). That is, the curve never crosses the t-axis and so there are no real 
solutions (a crossing gives areal solution to the quadratic) to Im 42—2tln+1+ ae =0. 
Rather, the solutions must be complex. Since the solutions are given by the quadratic 


formula 
2In+1 + \V/ 4P 4, = Al mIm+2 
t= 


2, 


then we immediately see that, for complex solutions, we must have 
AP. —4ImImy2 <0 
or, 
Eq = lala 
and, just like that, we have Coleman’s inequality: 


Inti < VImlm42- (3.8) 


Coleman next pulls some more big rabbits out of the hat I’ve labeled as (3.8). To 
start, set m = n in (3.8) to get 


That < V/ Tndnae (3.9) 


and then set m = n — | in (3.8) to get 


Aer ae (3.10) 


Next, looking back at (3.6), if we set m = n+ | we have 


Inet =Th-1, (3.11) 
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and if we set m = n + 2 we have 


n+1 
Thi2 = “nh In 
or, 
n 
I, = a 2: 


Now, put (3.11) into (3.10) to get 
Th < leila 
or, 
In <In4t. 


And then put (3.12) into (3.9) to get 


n / on 
Tho < ag ala =In42 ced 


i a 
or, since 75 < 1, then 


That <In42. 
Combining (3.13) and (3.14), we have 


In+2 > Th+t > In. 
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(3.12) 


(3.13) 


(3.14) 


(3.15) 


In words, since n is an arbitrary (up to now, but not for much longer) positive integer 
then any three consecutive members in the family of integrals defined by (3.5) satisfy 
the double inequality of (3.15). Now, at last, after all these preliminaries, we are 


ready to get down to business. 
Recalling (3.6), 


we replace every m with m — 2 to get 


m—2-1 


In-2= 


In-2-2= —, Im-4 
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which says 


Notice that the numerators in the factors on the right of this equation are all even, and 


that the subscript on the integral I on the right is odd. 
We continue doing this until we reach I,. That is, we end-up with 


ky 2k=2, 2), (4). 2K) 
n n n n 


Ing = 1 


a total of k factors 


Next, and at last, we pick a value for n. Let n = 2k + 1. Then, 


(2)(4) (2k) 


I — 
2k+1 (2k + 1) 


For m = 1 and n = 2k + 1, we have from (3.5) that 


Changing variable to u = (2k + 1) = (and so du = (2k + 1)x dx or, dx = 


have 


I= wer" eu is! a oe 
ae (Qk+1)x 2k+1 J, Seni 


Thus, 


(2)(4) ..- (2k). 


Ixy = (ak + 1) 


Putting (3.16) aside for now and again recalling (3.6), 


du 
1 


De+Ix? we 


(2k+ 


(3.16) 
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Next, set m = 2k + 2 (and so now m is even) to get 


(2k + 2-1) (2k + 1) 
Toq42 = Ix 2-2 = Se ee 
Now 
(2k —1 
In, = 2k—2 
and so 
2k+1 2k—1 
Inpi2 = ( deh Deo. 


Notice that the numerators in the factors on the right are all odd, and the sub-script on 
the integral on the right is even. This continues, then, until we reach Ip. That is, 


Mk+1%k-1, gly _ (1)() -- @k+ Di, 
n n n n 


Iny2 = 


a total of k+1 factors 


Then, for m = 0 and n = 2k + 1 we have, from (3.5), 


=f e~ C+ dx 
0 


or, with the change of variable we =(2k+ 1) x (and sou = /2k + 1x or dx = Tea) 


Ip = eos [era 
UT ORE fo 
Thus, 


lg — WG) 2k+1) 1 ie 
, (+1! VIk+ 1 So 


—2" du. (3.17) 


As we did with (3.16), put (3.17) aside for now (for just a bit) and let’s go through 
this process just one more time. Returning once more to (3.6), set m = 2k + 3 and so 


(2k + 3-1) 2k+2 
n 


Inxn43 = In,43-2= Inxs. 


38 3 How Euler Could Have Done It Before Laplace (But Did He?) 


Thus, 


(2k+ 1-1) 


bi = = 


2k 
Ing1-2= p Pk-! 


and therefore 


2k + 2. 2k 
n 


In43 = OF e-1- 


The numerators in the factors on the right are all even, and the subscript on the 
integral on the right is odd. If we continue this process we arrive at 


Ak+2,2k, 42) _ (2)(4).. (2k +2) 


bs I. 
n n n nkt+1 : 


Iok+3 = 
a total of k+1 factors 


As before, set n = 2k + 1. As we found the first time we did this—when we derived 


(3.16)—we got I; = SeeT and so 


(2)(4) ... (2k + 2) 
(2k + 1)? 


Tok 43 = (3.18) 


We now return to the double inequality of (3.15) and insert our results from (3.16), 
(3.17) and (3.18). That is, we write 


(2)(4)(6)--.(2k+2) | (G)(5).--@k+1) 1 [rove 
Chi Zep V2k +1 Jo 
5, (2)(4)(6) - (2k) 
(2k + 1 a 


and watch this become 


ANG). is 2) (1)(3)(5)...(2k +1) mea fee > (2)(4) 


x (6)... (2k) 


or 
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PHANG) RFA) , (1)(3)(5)... (2k - 1) VII [eau > (2)(4) 


x (6) ... (2k) 


or 
(2)(4)(6) ... (2k +2) > [ro ” 
(1)(3)(5)... (2k — I) V2k + 1(2k +1) ~ Jo 
. (2)(4)(6) ... (2k) 
(1)(3)(5)... (2k — 1) 2k +1 


(2)(4)(6) ... (2k) 2k +2 i 
DO a yea (met 7) > ff ee 


7 (2)(4)(6) ... (2k) 
(1)(3)(5)... (2k — 1) 2k +1 


and so we finally arrive at 


(2)(4) ... (2k) 1 — 
DG Gey art i)>f ed 


5 (2)(4) ... (2k) 
(1)(3)...(2k—1)/2k +1 


(3.19) 


Now, let k > oo. 

As a look at equation (B.6) in Appendix B shows, the probability integral in the 
middle of (3.19) is bounded from both below and above by quantities that are each 
approaching «fs As mathematicians amusingly put it, the probability integral has 
been trapped between upper and lower bounds that are closing-in on each other like 
the jaws of a vise, and so the integral is squeezed to the value ia This is the 
mathematical version of ‘being caught between a rock and a hard place.”° 


5]’ll take the imagery of the so-called squeeze (or sandwich) theorem as obvious but (for the pure at 
heart), it can be given a more rigorous treatment. See, for example, Inequalities by Michael J. Cloud 
and Byron C Drachman, Springer 1998, pp. 14 and 128. At the end of Chap. 6 you’ll see another 
example of the squeeze theorem in evaluating the probability integral. 
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3.3. Derivation #3 


The first two of this chapter’s imagined Euler derivations are, admittedly, just a bit 
on the complicated side. Here, as the final, third offering of ‘how Euler could have 
done it,’ is a beautifully clear and concise approach to evaluating the probability 
integral. As shown in Appendix D, Euler was in possession—well before 1774—of 
the gamma function which is, in modern notation, 


T(n) =| e *x"~ dx, 
0 


and he knew the particular value of [(n) for n= 5. That is, Euler knew (see equation 
(D.7) of Appendix D) that 


Thus, putting n= 5 into the gamma function integral, we have 


rG)= e7*x2 ieee] x tare [oS a ox= Va 
2 0 0 


If we now change variable to x = uw (and so dx = 2u du) we have 


00 2 00 2 oo 
e€ € 2 
2u du=2 | <udu=2 | e "du=/n 
| Vw o wu 0 


and so, just like that, we have 


i e7" du= tk 
0 2 


To repeat the question asked in the paragraph that opened this chapter, did Euler 
know how to calculate the probability integral before Laplace? I think he did, but my 
earlier comments are just a bit dodgy. Another writer has been far more forceful in 
expressing his belief that Euler did know: “[TJhere is no doubt [my emphasis] that 
well before [1774] Euler knew the equivalent result if { = In(x)}~ Way = ja 
This view has been argued again more recently by another writer who, after 
referencing Gauss’ praise of Laplace’s 1774 calculation (see the final sentence of 


®Nick Lord, “Intriguing Integrals: An Euler-Inspired Odyssey,” The Mathematical Gazette, 
November 2007, pp. 415-427. Euler published this integral in 1772, 2 years before Laplace’s 
calculation. 


3.3. Derivation #3 41 


Chap. 2), wrote “Mathematicians who were contemporaries of Gauss were very well 
aware of the fact that [Euler calculated the probability integral before Laplace].”’ As 
an example of this supposed common knowledge, Di Biase tells a heretofore untold 
story of the Italian mathematician and astronomer Barnaba Oriani (1752-1832). Ina 
copy of Gauss’ 1809 book, in which he calls Laplace’s 1774 derivation “elegant,” 
Oriani handwrote this note (in Latin) next to Gauss’ praise: “The elegant theorem, 
attributed to the illustrious Laplace, was in reality first found by Leonard Euler.” 


7Fausto Di Biase, “A Correction of the Historiographical Record on the Probability Integral,” 
posted on arXiv.org (the Web-based, open-access archive operated by the Cornell University 
Library), October 2019. The copy of Gauss’s book, containing Oriani’s scribbled note, is in the 
library of the ETH Ziirich (Einstein’s alma mater). 


Chapter 4 ®) 
Laplace’s Second Derivation se 


In 1812 Laplace returned to the probability integral and, perhaps thinking that his 
1774 derivation (see Chap. 2) could be improved upon, achieved that goal in truly 
spectacular fashion. Here’s how. 

But first, let me show you a nifty little trick with integrals, one not usually 
discussed in freshman calculus but nonetheless easily grasped. We’ll use it here in 
just a moment, and again in Chap. 6. Suppose we have the function f(x, y) = g(x)h 
(y). That is, f(x, y) is what mathematicians call a separable function. Next, imagine 
we integrate f(x, y) over a square region in the x, y-plane, to form 


i2y / “t(x,y)dx dy = / , i "g(x)h(y)dx dy = | “a(x) { / “n(y)dy fa 


The inner integral at the far right, a definite integral, is simply a number, and so we 
can pull it outside the x-integral. Thus 


{es} 0} 


That is, 


{ / ‘a(sjax} { i ‘ney =| i “e(x)h(y)dx dy. 


A product of two one-dimensional integrals has become a single double integral. 
Now we start by considering the square of the probability integral, I. That is, 
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r (Lee (fers) 


or, using our ‘nifty trick’ in reverse, 


r= f ee Max ay= | {[ eax hay, (4.1) 


Next, change variable in the inner integral to x = yt where, of course, y in the inner 
integral is treated as a constant because the dummy variable of integration in the 
inner integral is x (not y). Then we have dx = y dt and so 


p= [| fo Oly atkay (4.2) 


or, assuming we can reverse the order of integration’ in (4.2), 


p= | {/ ye" ay ha 
0 0 


In the new inner integral, we now treat t as a constant because y is the dummy 
variable of integration. The inner integral is easy to do, by changing variable to 


u = y’ (t* + 1) which says du = 2y(t? + 1)dy and so dy = en So, 


°° -y? (+1) — ~ —u du — 1 = —u eos 
[ ey [ve We+h 2wWihs|, © oY 2(2 +1) 


Thus, 


© 1 1 f* 1 1 
P= dt= dt= <= tan ~'(t 
| 2(2 + 1) f ea 2° (t) 


‘Engineers, physicists (and mathematicians, too), in the privacy of their offices with the doors shut 
to avoid judgmental eyes, routinely and without hesitation reverse the order of integration in double 
integrals just to see what happens. How could anyone with just an ounce of curiosity resist? If 
something good does occur, then a theorem named after the Italian mathematician Guido Fubini 
(1879-1943) is invoked that (hopefully) justifies the reversal. (If, however, something mathemat- 
ically awful occurs, they simply don’t tell anybody what they did because they remember the tale in 
the Preface of the two doomed mathematicians on Death Row.) 
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And so, just like that, we have 


You'll recall that Gauss called Laplace’s somewhat cumbersome first derivation of 
1774 “elegant.” What do you think he thought of this derivation? When a modern 
writer discussed this method, he declared it to be the “simplest” one possible, and I 
am inclined to agree.” 


?Hirokaza Iwasawa, “Gaussian Integral Puzzle,” The Mathematical Intelligencer, 2009 (number 3), 
pp. 38-41. Iwasawa starts his analysis with our equation (4.2)—declaring the integral in (4.2) to be 
one with a “magic integrand’—and says that to ask where that integrand comes from is like asking a 
“magician to give away [his] secret.” And yet, starting with our equation (4.1) makes it perfectly 
clear what the ‘secret’ is to (4.2). See also Constantine Georgakis, ““A Note on the Gaussian 
Integral,” Mathematics Magazine, February 1994, p. 47. 


Chapter 5 
Generalizing the Probability Integral oe 


Mathematicians like to generalize their results, to make them as widely applicable as 
possible. So here we’ll take a break from deriving the probability integral, and shift 
our attention to establishing some expressions that have the probability integral as a 
special case. Specifically, consider the claim 


| aa cos(2ax)dx = YE, (5.1) 
0 


Clearly, (5.1) reduces to the probability integral when a = 0. 

What Ill show you next is how Cauchy derived (5.1) in 1815, using a then 
common method of integration that he eventually came to consider to be fundamen- 
tally flawed. This concern was, in fact, a major factor in his development of the 
techniques of contour integration (discussed in Appendix F and used in Chap. 12). 
Before showing you how Cauchy derived (5.1), let me first set the historical stage 
for you. 

As the eighteenth century closed and the nineteenth opened, there was a common 
belief among mathematicians that (to quote a well-known math historian): “Any 
identity holding for a range of values of the [real] variables involved in [the identity] 
was universally valid, even when complex numbers ... were substituted for the 
[real] variables. An author appealing to this principle would usually use some such 
phrase like ‘the generality of algebra’ or ‘the generality of analysis’; many such 
appeals can be found in the writings of Cauchy’s immediate predecessors.” 


'F, Smithies, “Cauchy’s Conception of Rigor in Analysis,” Archive for History of Exact Sciences 
36 (no. 1), 1986, pp. 41-61. As an illustration of the casual attitude of Cauchy’s contemporaries 
toward rigor in analysis, Smithies offers this dramatic example, one that would make any modern 
(good) high school math student cringe. In his 1810 textbook on the differential and integral 
calculus, the French mathematician Sylvestre Francois Lacroix (1765-1843) claimed that 


4, =14+3+ x +... for any value of x even though the equality obviously fails for all [5| > 1. 


a-x 


Frank Smithies (1912-2002) was an English math historian at Cambridge University. 
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As the second decade of the new century began, Cauchy began looking for a way 
to avoid what he considered the sloppy math of just boldly substituting imaginary 
quantities into expressions derived using real quantities—at the end of this chapter 
I'll show you an example of doing that to evaluate two amazing integrals that almost 
surely will appear to you to be completely unrelated to the probability integral—and 
Cauchy’s great success in achieving that goal is discussed in Appendix F. 

But none of that professed reluctance meant that Cauchy was above using the 
“generality of algebra’ argument, himself, when it suited his needs! That’s exactly 
what he did, in fact, in deriving (5.1). (In Chap. 10 Pll show you a modern way to 
derive (5.1) that avoids the “generality of algebra’ argument.) Cauchy started with 
the result known to De Moivre and derived by Laplace: 


[- e *dx=yr. (5.2) 
Make the change of variable y = x + m, where m is some constant (and so dy = dx 
and x = y — m) to arrive at 
[- ce O-ahdy = e7Y t2ym—m ay = /q 

or, 

[- e7¥ —™ e2ymdy = V/z. (5.3) 
In the same way, if you let y = x — m you quickly find that 

[- ee Me 2Mdy = Vz. (5.4) 


So, adding (5.3) and (5.4), we have 
/ enV —m (rym 4 9 -2™) dy — I0/m. (5.5) 


Now we commit the ‘sin’ (in Cauchy’s eyes) of introducing an imaginary quantity 
for a real one. Specifically, let a = mi, i= V — 1. So, v= — m’, and (5.5) becomes 
(asm= — ia) 


/ eye (e~ 2 si e*Y) dy — 2/n. (5.6) 
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By Euler’s identity we have 
— i2ay iday _ 
e + e'*Y = 2 cos(2ay) 


and so (5.6) becomes 
/ e~¥ +" cos(2ay)dy =2V/t 
which reduces to 


i e~» cos(2ay)dy = /ne~* 


— oo 


or, as the integrand is even (and changing the dummy variable of integration from y 
to x) 


which is (5.1). 
There are, as you might suspect, many other similar generalizations of the 
probability integral. Consider, for example the class of integrals 


un) = [ xe—* dx,n>0, (5.7) 
0 
of which I(0) is the probability integral.* We start with the observation 

< eee) = (2n— 1)x2"-2—-*" — 2xx2t—1e-” = (2n — hee Sc we 


Integrating this from zero to infinity, we have 


= (x nT ek *)dx = (2n~1) x2" 2e-Xdx—2 | xMe-™ dx (5.8) 
0 dx 0 0 


I’ve taken this example from LI. G. Chambers, “The Integral [)~e e-*' dx,” The Mathematical 
Gazette, March 2005, pp. 52-54. 
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or, using (5.7) on the right-hand-side of (5.8), 


Lk (xe Jax = (2n— 1)I(n — 1) — 21(n) =0 


because the left-hand-side of (5.8) is 


‘| < (te ate 0. 
o dx 0 
Thus, 
— 2n-1 _ 2n(2n— 1) _ 2n(2n — 1) 
I(n) = 5 I(n—1)= 2n(2) I(n—1)= dn I(n— 1). 


Applying this recursion to itself a few times, we quickly see that 


1(n) = SPF) 


and so, since I(0) = 4./z, we have 


[tee ea G Vi). (5.9) 


We can do just a bit more with (5.9). Recalling equation (D.1) from Appendix D (the 
definition of the gamma function), 


T'(n) =| e-*x"~'dx, 
0 


and setting x = y”, we can write 
T(n) =| ey ydy =2 [ eV yn dy. 
0 0 


Now, let n=j + 4. Then 


r(i+3)=2 oF ltt) lay =2 | e-Y yPidy, 
0 0 
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That is, switching back to n for j, and to x for y, we have from (5.9) that 


i ae Go, 9 (en)! 71 
r(n+5)=2/ e yay =2 ar (5 VF) 


or, 


r(n+3) =O’ vanzo, (5.10) 


an expression of great use in ‘higher mathematics.’ Setting n = O gives us 
rG) =,/m, derived at the end of Appendix D and used in the last of Euler’s 
imagined derivations (in Chap. 3) of the probability integral. 

Another historically interesting example of a generalized probability integral is 
equation (5.11), due to Laplace: 


| eM hans 5 y/o, a20,b20 (5.11) 
- 2Va 


which is the probability integral for the special case of a= 1, b = 0.3 (in Chap. 10 Pll 
show you a derivation of (5.11) for the special case of a = 1, b > 0, a derivation that 
is significantly shorter than is the general derivation of (5.11).) Now, for our final 
calculations of the chapter, P’I] commit (more than once!) “Cauchy’s sin’ with (5.11). 
To start, let’s set a equal to the imaginary value i= — 1 , and set b = 0, to write 


| e-*dx= V5 
0 2Vi 


So, using Euler’s identity, 


[ {cos(et) =i sing?) 3 = M4 


= va T\) vi = wh 
2|cos (7) +i in ae V2 +i/2 


1 + wl—i 
Tan )1 —i) ea 


You can find a detailed derivation of (5.11) in my book Inside Interesting Integrals (2nd edition), 
Springer 2020, pp. 138-139. The assumption in that derivation is that both a and b are real, 
non-negative constants. 
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So, equating real and imaginary parts of the start and the end of this sequence of 
expressions, we have what is known to mathematicians as the Fresnel integrals*: 


J oseeyax= | sin(x?)dx= [5 =0.626657.. (5.12) 
0 0 


To finish this chapter in spectacular fashion, let’s do those two ‘amazing’ integrals I 
mentioned at the beginning. Specifically, let’s set both of the originally real con- 
stants a and b in (5.11) equal to the unreal i= V — 1. Then, (5.11) becomes 


i eM than | ot ax= 3 fe NAT = Lim -2i _ : VE ott 
; ; 2\i 2 ji 2 a 


= LV 21 L pe | fen), 
2 ef 2 2 
or, using Euler’s identity on both the integrand and the final right-hand-side, we have 


[ c0s(t + Baxi f sin(x? +) dx = ve [cos (2 +7) —i sin(2+3)]. 
0 0 
(5.13) 


Equating real and imaginary parts on both sides of (5.13), we have the amazing 
formulas 


[ cos (x? + 5) dx = VE cos (2+) = — 0.830599... (5.14) 
and 
[sine + B)ax= VE sin(2 +3) = 0.309036... (5.15) 


There is no question that this example, and the Fresnel integral calculations, look 
pretty slick but, like Cauchy, should we worry about the legitimacy of simply 
stuffing imaginaries into formulas derived with assumption of real quantities? In 
other words—are (5.12), (5.14), and (5.15) correct? 


“The Fresnel integrals are named after the French scientist Augustin Jean Fresnel (1788-1827) who, 
in 1818, encountered them during some work in optical physics. These integrals had, however, been 
first evaluated no later than 1781 by Euler, using the gamma function (for details, see my book An 
Imaginary Tale, Princeton 2016, pp. 175-180), in an analysis of the physics of a coiled spring. Alas 
(for the resumé of Euler), his analysis wasn’t published until long after his death, and that explains 
why the integrals are instead named for Fresnel. There is a bit more discussion on the Fresnel 
integrals in Chaps. 6 and 10. 


5 Generalizing the Probability Integral 53 


We can study that question experimentally by simply evaluating the integrals 
numerically and comparing the results with the theoretical values. To numerically 
compute the Fresnel integrals is easy using the Wolfram on-line integrator, with the 
numerical results being 0.626657 for both integrals, in excellent agreement our 
theoretical results. For the two amazing integrals of (5.14) and (5.15), however, 
we do have to be careful to avoid running into division-by-zero problems. To do that, 


write 
(oe) 1 ee) 
te 
0 0 1 
and then, in the middle integral, change variable to y= 1. Then wy =— 4 and so 
dx = —x"dy or, dx = — %. Thus, 


1 1 0° 2 1 
1 1 d cos (x +3) 
y) 2 y x 
7 cos(x a)ax= f c0s( 5+ y \(- *) | — zx 


and similarly 


1 © sin(x? +4 
| sin(x? +)d= f Se ae 
0 x 1 x 


(5.16) 


Xx 


oo (1 + x2) cos(x? + 5) 
xX 
=| d 
I 


x2 


and similarly, 


ee) 00 DNs 2 1 
Fi sin(x? + J) dx= f (l+x Bits +45) dx. (5.17) 
0 x 1 x 
Using the Wolfram on-line integrator, the final integral in (5.16) evaluates as — 
0.830599, while the final integral in (5.17) calculates as 0.309036. Both numerical 
evaluations are in outstanding agreement with the values produced by the theoretical 
expressions. 

To finish this little side-trip into playing with Laplace’s generalization of the 
probability integral, we would have to be completely lacking in curiosity not to next 
wonder about the integrals {>° cos(x* — 4)dx and {,~ sin(x? — 4,)dx. Pursuing this 
question goes pretty much the same as did our last set of calculations, but there is one 
extra twist at the end. We start by setting a = i and b = — iin (5.11), to write 


54 5 Generalizing the Probability Integral 


7 0 x 0 7 - 
fe ie (oan: 
oy 2e? \e# Ge) en cos G i +i sin (3) 
1 Tt TC 1-i 1 jx . 
G2) i a raallte: )=(2 (; as 1) 2 ree a 
—— + t — 
Equating real and imaginary parts, we therefore conclude that 


a 2 1 _ 1 Tu oe Z x 1 = 
[ evs(s 5)ex= 3 /5= sin(x?— dx =0.084808.... (5.18) 


To numerically check (5.18), we proceed as before, transforming i to f, ‘. to avoid 
division-by-zero concerns. I'll let you fill-in the easy details and, if you’re careful 
with the signs, you'll see that, as before, 


fore) co (] 2 2 1 
i cos (x a)ax= [ (lhe = (x x) dx 
Jo x 1 x 


but now 


oO oo 2). 1 s 2 t 
| = (e _ =) ica | (x ) al 2) aes 
0 x 1 x 


Given these two expressions, the Wolfram on-line integrator returns the value 
0.084808 for both integrals, in satisfying agreement with theory. It is interesting to 
specifically take note of the fact that 


while 


/ sin(x’ +33)dx¢ f cos (x? +5)dx. 
0 x 0 x 


“Cauchy’s sin’ may be a dubious procedure—but nonetheless it often works 
(be cautiously skeptical when committing it, however, and be prepared to do some 
numerical verification of the results). 


Chapter 6 ®) 
Poisson’s Derivation Chop for 


What I'll show you here is almost certainly the way the probability integral is derived 
each year on classroom blackboards in 99% of college calculus, engineering, and 
physics courses, worldwide. It is the standard textbook presentation, too, and it is the 
way I first learned how to do the probability integral. I still remember that occasion 
because it was the source of not just a little embarrassment, as my education came a 
bit later than it should have. 

I was in my first year of graduate (!) school at Caltech when, one day, the integral 
came up in a lecture in an electrical engineering class. The professor simply assumed 
everybody knew the value of the integral (I didn’t), and when I later expressed my 
surprise at the prof’s assumption a fellow student (who, unlike me, had done his 
undergraduate work at Caltech) expressed his surprise at my surprise. He happily 
told me that every Techie with a heartbeat knew that integral by the end of their 
freshman year. While I had no reason to doubt that claim, I soon realized the main 
goal of my classmate was to wage a bit of psychological warfare on somebody who 
had come to Caltech from elsewhere. Still, I quickly realized that in this particular 
matter he was right—the probability integral is something every first-year under- 
graduate in physics, engineering, and mathematics should indeed know how to 
do. So, I tracked down a derivation in a textbook, and here is what I found. 

The derivation uses, at its start, a clever trick that we’ve seen before (in Chap. 4) 
but then it suddenly veers off along an entirely new path that is nowhere to be found 
in any other derivation of the probability integral. It is named after the French 
mathematician Siméon Denis Poisson (1781-1840), even though he didn’t (appar- 
ently) actually publish it. We know of it as being due to Poisson only because the 
French mathematician Jacob Sturm (1803-1853) attributed it to Poisson in a text- 
book published many years after Poisson’s death. 

We start by recalling Eq. (4.1) in Chap. 4, the development of the square of the 
probability integral: 
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p= [o fe (P4¥") dx dy. (6.1) 


In Chap. 4 our next step was to make a change of variable but now, instead, we’ll 
think geometrically about what the double integral means. It signifies the integration 


of e~ +9") over the entire first quadrant of the x,y-plane, as both x and y vary from 
zero to infinity. Poisson’s great idea was to skip the change-of-variable step used by 
Laplace, and to instead change the co-ordinate system from rectangular to polar 
(a trick used and explained in Appendix D—see note 2 in that appendix).! 

Since the numerical value of the integral in (6.1) shouldn’t depend on the 
co-ordinate system we choose (that, after all, is an arbitrary decision), we can use 
whatever system is most convenient. As you’ll see next, in polar co-ordinates it is 
really easy to do the integral. The first quadrant of the infinite plane is swept-out in 
polar co-ordinates by the variables r = ,/x? + y? (the so-called radius vector) and 0 
(the polar angle), defined on the intervals 0 < r < oo and 0<0< 5. So, using the 
differential area patch rdrdO@ appropriate for polar co-ordinates (as discussed in 
Appendix D) we see that (6.1) becomes 


m/2 poo 5 ore) 3 n/2 n f® 5 
p= | i eFrardo= f re" i, do a5 [ re "dr 
0 0 0 0 2 Jo (6.2) 


99,2 


and so, as one writer put it, “the answer magically pops out”: 


= | shea ia. (6.3) 
0 2 


Isn’t that clever? Indeed! But there is a bit of mystery to it, too, because, as the same 
writer wrote in an earlier paper, “This is a beautiful argument, but one never 
encounters it outside of this one example [my emphasis].”° What does that 


' About doing this sort of thing, one author wrote (with reference to Poisson’s derivation) “Certainly 
many mathematicians have felt it strange to have to resort to geometrical notions alien to the 
problem in order to solve it.” See Juan Pla, “A Footnote to the Theory of Double Integrals,” The 
Mathematical Gazette, July 2010, pp. 262—269. I am not sure if I agree with the idea that there is 
something “strange” or “alien” about making a co-ordinate change in a problem involving an 
integration over a region of the plane. 


Denis Bell, “Poisson’s Remarkable Calculation—a method or a trick?” Elemente der Mathematik, 
February 2010, pp. 29-36. 


3Denis Bell, “On the Limitations of a Well-Known Integration Technique,” Mathematics Maga- 
zine, October 1993, pp. 243-244. 
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somewhat enigmatic remark mean? Given as to how changing co-ordinate systems 
instantly unraveled the probability integral, then, as another writer mused, “One 
question that may have occurred to many over the years is: What else can I do with 
[Poisson’s co-ordinate change trick]? The surprising answer to this natural 
question is: Absolutely nothing!”* 

That perhaps astonishing claim is made in the papers by Bell, and in Dawson’s, 
arguing that the only integral that yields to Poisson’s co-ordinate system change is 
the probability integral. To put it bluntly (in circus jargon), they claim Poisson’s trick 
is a ‘one-trick pony.’ The final line in Dawson’s paper is a bit more gracious: “It is 
surprising that so elegant a trick should have one, and only one, application; and it is 
gratifying that that one application should be one so vitally important in mathematics 
and statistics.” And, he could have added, important in engineering and physics, 
as well. 

Now, with all that said, is it then a shock for you to read this opening sentence 
from a paper published years before the papers of Bell and Dawson: “The standard 
device using polar coordinates [my emphasis] to derive [the probability integral] 
can be used [my emphasis] to evaluate the Fresnel integrals.”° Bell and Dawson 
don’t mention this paper, despite the fact Professor Flanders® makes (in my opinion) 
a quite strong case rebutting their claims! Here’s how he did that, in the form of a 
beautiful counter-example. 

We start by defining the two functions F(t) and G(t) as follows, where t is some 
(arbitrary) non-negative variable: 


and 


cw = | gr sin(x*)dx. 
0 


Thus, F(O) and G(O) are the Fresnel integrals, integrals we first encountered in 
Eq. (5.12) in Chap. 5. We then write 


“Robert J. MacG. Dawson, “On a ‘Singular’ Integration Technique of Poisson,” The American 
Mathematical Monthly, March 2005, pp. 270-272. 

>Harley Flanders, “On the Fresnel Integrals,” The American Mathematical Monthly, April 1982, 
pp. 264-266. 

Harley Flanders (1925—2013) had a distinguished career at several institutions, and in particular 
was professor of mathematics at Florida Atlantic University (Boca Raton) when he wrote his paper. 
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F’(t) —G°(t) = (i c0s(x")d ie cos ()ay) 
— (i sin(x?)e ) Cea sin(y")dy 


and so, using the ‘nifty integral trick’ of Chap. 4, we have 


o=[- ee ae [cos(x”) cos(y”) — sin(x”) sin(y”) ]dxdy. 


Or, remembering the identity cos(A + B) = cos (A) cos (B) — sin (A) sin (B), 


w= fe (4¥") cos (x + y’)dxdy. 


Then, using Poisson’s change to polar co-ordinates trick (and with this step the 
claim of Bell and Dawson about the trick being strictly limited to the probability 
integral takes a significant hit), we have 


n/2 ore fore 5 
F’(t) — G’(t) -{ ao [ e* cos(?)rarh = 5 | e~™ cos(r?)rdr. 
0 0 0 
Next, change variable to x = r (and so oe = 2r or, dr = =) and thus 
2 ae _ r ” —tx dx _ a x —tx 
F(t) —G°(t)= 5 i e ™cos(x)r a if e ™cos(x)dx. 


The right-most integral can be done by-parts (but even easier is to just look it up in a 
good set of integral tables!): [ e* cos(bx)dx = A breosl a) ee sin(hrl . Thus, with 
a= —tandb=l, 


and so 
F(t) —G2(t) = (3) —, (6.4) 


Now, for the moment, put (6.4) aside. 
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Proceeding in much the same way, we next write 


=. fre *("+9") cos (x*) sin (y*) dxdy 


or, remembering the identity sin(A + B) = sin (A) cos (B) + cos (A) sin (B), we see 
that 


sin(A + B) — sin(A —B) 


5 = cos(A) sin(B). 

Thus, 

HT SC 0 

sin(x* + y ie sin(x*—y*) _ cos(x2) sin(y?) 
and so 

Fo fy2 ty) cin( yx? — v2 
bain (eeaye) Sing? + y?) = sin(x? —¥4) 4 
0 2 

or, 


G(t) ya) = fo re (x+y? ) sin(x? + y*)dxdy 


-[- ,e G4") sin (x? _ y’)dxdy. 
0 Jo 


Now, our next step is to show that the right-most double integral in (6.5) is zero. That 


is, that 
i ee (x4y") sin (x? — y’)dxdy =0. 


Professor Flanders skips over establishing this point in his paper (he was writing for 
professional mathematicians, and may well have simply taken it as “obvious’), but it 
is an essential step, and so here’s why that double integral vanishes. The region of 
integration is the entire first quadrant of the infinite plane, and every point in that 
region lies in one of the following three sub-regions: (A) on the line y = x, (B) below 
the line y = x, and (C) above the line y = x. For the points in region A, the integrand 
of the double integral is zero, as the sine factor is zero for y = x. For every point 
(x, y) in sub-region C, there is a unique associated point in sub-region B at (y, x)— 


(6.5) 
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notice that these two points have the same value of r because x” + y” = y” + x*—and 
the sum of the integral integrand at these these two points is zero since sin(x? — 
y’) + sin (y* _ x’) = sin (x? y’) sin (x? y’) = 0. Since all the points ‘near’ 
(x, y) are associated with all the points ‘near’ (y, x), we see that a differential area 
patch centered on (x, y) cancels the differential area patch centered on (y, x). Thus, 
the double integral over the entire first quadrant is zero and (6.5) reduces to 


2F(t)G(t) = |e ce sin(x* + y”)dxdy 


If we now again use Poisson’s trick and change to polar co-ordinates, we have 


2F(t)G(t) = | “of | ee sin()rar} 


or 


2F(t)G(t) = 5 | vet sin(r°)rdr. (6.6) 


Changing variable to x = 1 as we did earlier, (6.6) becomes 


Integrating by-parts (or once again turning to integral tables), we see that 


e*{a sin(bx) — b cos(bx)] 
a2 + b? , 


/ e™ sin(bx)dx = 


Thus, witha = —tandb= 1, 


14+t 
and so 
2F()G(t) = (F) — (6.7) 


In (6.4) and (6.7) set t = 0 to get 


F’(0) — G?(0) =0 (6.8) 
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(6.9) 


where, you’ll recall, F(0) and G(O) are the Fresnel integrals. From (6.8) we see that F 
(0) and G(O) are equal in magnitude, but their algebraic signs are not determined. 
That is, (6.8) is satisfied by both F(O) = G(O) and by F(O) = — G(O). Here’s how to 
resolve that ambiquity. 

From (6.9) we see that since 7 > 0 then F(0) and G(O) must be either both negative 
or both positive. If we sketch the curve of the integrand of G(0), that is, the curve sin 
(x), you see that successive zero-crossings are ever closer, and so the areas enclosed 
by each successive half-cycle of sin(x*) form an alternating series of ever-decreasing 
values. A famous theorem in freshman calculus tells us that the sum of such series 
converges, and that the limiting sum is (in this case) positive since the first term of 
the alternating series is positive. That is, G(O) > 0. But that means F(O) > 0, too 
(from the opening sentence of this paragraph). Thus, (6.8) says F(0) = G(O) and so 
(6.9) says F(0) = G(0) = \/Zand we are done, with our success intimately tied to the 
power of Poisson’s change of co-ordinates trick. 

In fact, this elegant analysis has done even more than compute the traditional 
Fresnel integrals. As Professor Flanders observed in his paper, if we back-up to 
Eqs. (6.4) and (6.7), where we showed 


F*(t) —G2(t) = (3) aa (6.4) 
and 
2F(t)G(t) = (3) i= (6.7) 


we can solve these equations for F(t) and G(t) as functions of t (not just for t = 0). 
The results’ are, for all t > 0, 


oo 5 / 2 
F(t) =| e~™ cos(x”)dx = fj ene (6.10) 
0 +t 


’This is a nice problem in high school algebra. Use (6.7) to find G(t) in terms of F(t), and then 
substitute the result into (6.4). This results in a quartic that is quadratic in F°(t). Then, take the 
square-root (G(t), of course, then immediately follows). 
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Fig. 6.1 The regions D1 y A 
and D3 ‘squeeze’ the 
probability integral in region 
D2 as R > 00 

RV2 

D3 
R 
D1 
o > 
0 R RV2 x 

and 


Oita n /V/1l+t2-t 
cw = | e * sin(x?)dx= 8 fae" (6.11) 


With (6.10) and (6.11) we see that Poisson’s trick has allowed us to solve an entire 
class of integrals, and that is surely enough to elevate Poisson’s ‘trick’ to something 
a lot more substantial than just being a ‘one-trick pony’ that works only for the 
probability integral.® 

To end this chapter, let me show you one more example of the usefulness of polar 
coordinates in evaluating the probability integral. The approach I am about to show 
you is discussed in the paper by Chambers (see note 2 in Chap. 5), and yet again ina 
more recent one.” Figure 6.1 shows three regions in the first quadrant of the x, 
y-plane: a circular region of radius R, a square region of side-length R, and a circular 


SRight years after Flanders’ paper was published, another paper connecting the probability integral 
and the Fresnel integrals appeared: Robert Weinstock, “Elementary Evaluations of i e-* dx, 
fo cos x2dx, and tr sin x2dx,” The American Mathematical Monthly, January 1990, pp. 39-42. I 
haven’t discussed Professor Weinstock’s paper here for two reasons. First, it doesn’t use Poisson’s 
trick (the whole point of this chapter) and, second, his results are not as general as Flanders’. 
Weinstock (1919-2006), a professor of physics at Oberlin College (Ohio), used Leibniz’s formula 
for differentiating an integral, and when we get to that approach in Chap. 10 P’ll mention Weinstock 
again. 

°Hsuan-Chi Chen, “Evaluating the Probability Integral by Annular Partition,” The Mathematical 
Gazette, March 2016, pp. 111-113. 
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region of radius RV2. These three regions are denoted by D1, D2, and 
D3, respectively. What we are going to do is integrate the function f(x, y) = 


ere ae (r,8) =e~" over each of these three regions, using polar coordinates 
in the circular regions (D1 and D3) and rectangular coordinates in the square region 
(D2). 

We start our calculations by making two observations: 


(a) The function f(x, y) =e— (y") =F (r,8) =e" is everywhere positive 


(b) Circular region D1 is smaller than square region D2 which in turn is smaller than 
circular region D3. 


So, it is immediately clear that 


[ft 0)r dr d8< [fossa dy < [fs 0)r dr dO. (6.12) 


D3 


We evaluate these three double integrals as follows: 


; 4\ RK 5 
Di a Re" r dr dO= # pe Trdr=§(- feo) | =4(1-e-®’). 


D2: 
R R 2,.2 R 2 R 2 R 2 R 2 
[ [Pr axay= | ae! eYaybax= [ eax | e *¥ dy 
o Jo 0 0 0 0 
2 


={ f'e“ax} 


n/2 pRV2 RV2 : 
i, | eFrardo= 5 | e*rdr= 5 (- se") 
D3: Jo 0 2 Io 2 2 
Putting these results into (6.12), we have 


I e : T 
_a-R’ — x? fy ,—2R° 
a0 e )<{ fe ax} <5 (1 e ) 


and so, if we let R — oo, both the upper and lower bounds of the double-inequality 
approach 7 and the integral-squared is squeezed to 7 . That is, 


Rv2 


0 
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and so 


Part II 
Evaluating the Probability Integral 
(An Interlude) 


Chapter 7 ®) 
Rice’s Radar Integral oe 


In this chapter, and the two that follow it, we’ll take a break from our relentless quest 
for derivations of the probability integral, and instead enjoy what I think of as an 
interlude. An interlude in which issues related to the probability integral will take 
center-stage. In this first chapter of the interlude, and in the next one, as well, we will 
face head-on the central puzzle of the probability integral, the puzzle mentioned by 
G. H. Hardy in the quotation that opened the Preface—why is there a question mark 
after the equals sign in the indefinite integral [ e~ dx = ?Hardy knew full-well that 
the answer to this question had been provided long-ago by the French mathematician 
Joseph Liouville (see note 3 in the Preface); Hardy also knew that Liouville’s work 
was, for some reason, not well-known when he gave his talk in New York City in 
1928. In the next chapter Pl] show you how Liouville provided an explanation for 
that question mark, but first, in this chapter P’ll show you how the problem of 
integrating a function involving e-* arises in a problem of great technological 
interest. To start, let me share with you an e-mail I received in March 2022 from a 
reader of my book Inside Interesting Integrals (Springer 2015, 2020). Sent by a 
retired engineer, he wrote 


As an engineer trained at MIT in the 50’s, I highly value analytic solutions these days when 
people rely on numerical predictions, often without understanding the physics of the 
problem being modeled ... I am wondering if you, or someone else you know about, has 
ever been able to analytically solve Rice’s integral for the probability of detection of a 
sinusoidal signal in Rayleigh noise: 


‘As two modern authors have related, “{Liouville’s memoirs] seem to have fallen into an oblivion 
which they certainly do not deserve.” That is a quotation from Hardy’s 1905 book The Integration 
of Functions of a Single Variable (see A. D. Fitt and G. T. Q. Hoare, “The Closed-Form Integration 
of Arbitrary Functions,” The Mathematical Gazette, July 1993, pp. 227-236). 

? Rayleigh noise is any intrusive electronic signal with a random amplitude that has a Rayleigh pdf 
(see the Epilogue to Chap. 1 for more on this point). 
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oR (Ce) (=) 
Pp = el Ww /Ip{ — )dR 7.1 
5 | < (% (7.1) 


where Vy is the threshold voltage (to be explained, soon), A is the sine wave 
amplitude, g is [a measure of the noise power], and Ip is the zeroth-order modified 
Bessel function? 


Both Rice, in his original paper, and Skolnik [in his book], state that ‘This integral cannot be 
evaluated by simple means, and numerical techniques or a series approximation must be 
used.’ ... I wonder if you or anyone else has been able to successfully solve this integral 
analytically. 


There is a lot in that e-mail that needs explaining—and [ll soon do that—but first 
here is what I wrote back to my correspondent, a week later: 


Thanks for your interesting e-mail; it brought back some memories. Back in the late 1960s I 
worked for several years as a radar systems engineer at the Ground Systems Division of the 
Hughes Aircraft Company in Southern California, and my days then were filled with ‘signal 
detection in the presence of noise’ problems. Skolnik’s book was the Bible on radar among 
my colleagues. So, with all that as background, at least a little discussion of Rice’s integral 
would have been a natural for inclusion in /nside Interesting Integrals, and your note made 
me wish I had thought of doing that. 


To answer your question, no, I don’t believe anybody has ever found any way to evaluate the 
detection integral other than as a power series. I suspect mathematicians can prove that there 
is no way to formally express the integral in terms of elementary functions. I'll have to look 
into that. 


And so I did, with the next chapter discussing what I found. But before we get to 
Chap. 8, let me elaborate just a bit on some of the language in my correspondent’s 
e-mail. 

Rice was Stephen O. Rice (1907-1986), an electrical engineer at the Bell Tele- 
phone Laboratories who published a multi-part paper titled “The Mathematics of 
Random Noise” in The Bell System Technical Journal in the mid-1940s, in which his 
(now) famous signal detection integral first appeared. 

Skolnik was Merrill Skolnik (1927—2022) who, for decades, was considered to be 
the ‘Dean of Radar Authorities’ (he was Superintendent of the Radar Division of the 
famous Naval Research laboratory (NRL) in Washington, DC for many years). In 
1973 I was at NRL on a post-doctoral appointment and had the great pleasure of 
talking with Skolnik at an informal office party at the Lab. It was enormously 
exciting to meet the man whose well-thumbed book (Introduction to Radar Systems, 
McGraw-Hill 1962) had been on my office desk for years during my time at Hughes. 

The modified Bessel function’ of the first kind of order-zero is written as the 
power series 


>There is a vast family of functions called Bessel functions, named after the German astronomer and 
mathematician Friedrich Wilhelm Bessel (1784-1846), which are solutions to a certain differential 
equation (named Bessel’s equation but which had, in fact, been studied by others long before 
Bessel). This seemingly odd situation perhaps occurred because Bessel used the equation in his 
astronomical work, and so brought the differential equation to the attention of an audience much 
broader than it had been before. Bessel functions occur in a multitude of applications in addition to 
astronomy and radar (the theoretical study of FM radio, for example, starts with Bessel functions— 
see my book The Mathematical Radio, Princeton 2024). 
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x2 x4 x6 
Ip(x) =1+ a + ed? + ed20G? eee 


(7.2) 


A central question that Rice considered in his analyses was that of ‘how, when 
examining an electrical signal that might be the sum of a sinusoidal waveform and 
random noise, or might be just noise alone, does one decide if the sinusoid is actually 
present?’ This is not a made-up homework problem, but one that occurs in the 
analysis of a radar echo. Rice published in the mid-1940s, but he had been working 
on this and related questions for years, with the military needs of World War 2 radar 
providing a Jot of motivation. 

Radars detect the presence and range of remote objects by transmitting periodic 
bursts of electromagnetic energy, and then measuring the time interval from the 
transmission of a burst to the reception of a reflected echo. The transmitted signal is a 
very high frequency sinusoidal wave of brief duration (typically a microsecond or 
so).* If a remote object (typically an airplane) is 93 miles distant, for example, the 
round-trip distance from radar-to-object-to-radar is 186 miles which takes (at the 
speed of light) one-thousandth of a second (1000 is). This process takes place in the 
real-world of electronic noise, which is what any electrical signal not intentionally 
generated is called (one source of electronic noise is due to the random thermal 
motion of electrons in the radar’s own circuitry). So, the detection of an echo is done 
in the presence of noise, and the fundamental question of “how does a radar receiver 
distinguish a noise-only fluctuation from the fluctuation caused by the presence of a 
reflected sinusoid?’ immediately steps forward. 

The standard technique is to take the signal in the radar’s receiver and threshold 
it. That is, only signals large enough to exceed a predetermined amplitude will be 
declared to contain a sinusoidal echo. However, now and then random electronic 
noise alone will exceed the threshold value and so generate what radar engineers call 
a false alarm. Too high a rate of false alarms is the radar version of ‘the boy who 
cried wolf’ so often that after a while everybody ignored him. The rate at which false 
alarms occur can be reduced by increasing the threshold value, but of course that 
makes it more difficult for an actual target echo to exceed the threshold, which means 
some real targets echoes will be missed. We have, therefore, a classic example of a 
trade-off between false alarms and missed detections. 

Rice treated the decision problem as a random process and, after some enor- 
mously clever analysis (which Ill skip here since this is not a book on radar 
engineering), arrived at the definite integral in (7.1) that is central to determining 
what should be the proper threshold setting, as a function of how noisy is the 
environment the radar is operating in. This integral, now called Rice’s integral, 
can be written, putting (7.2) into (7.1), as 


“Tf the frequency of the radar is (to pick a typical value) one gigahertz (1000 MHz), then a 
one-microsecond transmission burst consists of one thousand sinusoidal cycles. 
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- R_ (es ry Cy @ 
Pp [, oc a 1+ 2 + red? + Pefiage dR. (7.3) 
Making the change of variable r= RE A’ we have, after differentiation, 
x _IR_R 
dR 2p @ 
and so 
dR= *o xdx. 
Also, x = oo when R = ov, and when R = Vy we have x = vet = which [ll call 
a. Thus, (7.3) becomes 
2 4 6 
~R GR oc, 29 
a230 9 _\e 9 ; 
Po= [ oO 1+ a + ate42 + 52ea20G fice R xdx 
or, 
2 4 6 
= eS) & & 
ay Ae ” @ ; 
Pp =2f xe 1 t y + e42 + Ped2e6? Trai satis dx. (7.4) 


which says 
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where m and n are constants determined by A and @. Thus, (7.4) becomes 


ee ae mx?—n (mx?—n)” (mx?—n)’ 
aes | xe 1+ 1 + eq? + edteg? +...~dx. (7.5) 


From (7.5) we can immediately write 


Po= cf xke7* dx (7.6) 


where the cx coefficients are functions of A and @ and, as mentioned before, the 
lower integration limit, a, is a function of A, g, and Vy. As you’ll see in the next 
chapter, with the lone exception of the k = | case the indefinite integrals [ xke~*' dx 
cannot be expressed in terms of a finite number of elementary functions. That is why, 
as Rice and Skolnik claimed, the definite integral for Pp in (7.1) has to be evaluated 
numerically. 

To end this chapter on an historically amusing note, Rice’s work was originally 
intended to be his doctoral dissertation at Columbia University. After submitting it to 
an unnamed academic department and having it declined (‘we don’t study such stuff 
here’), Rice tried again with a different (unnamed) department and got the same 
gloomy result. At which point he gave up, returned to Bell Labs, and never received 
a doctorate from Columbia. Since there are three obvious possibilities for the 
academic departments Rice approached (math, physics, and electrical engineering), 
then each one of the two ‘misguided’ (I would prefer to use the alternative word 
dumb here, but politeness restrains me) departments can claim that ‘it wasn’t us that 
made the unfortunate decision to reject Rice’s work’). The whole affair is now 
looked upon as similar to giving a randomly selected man in a firing squad a blank, 
so that all can feel that it was he who fired the non-fatal shot.° But, just to show that 
justice was ultimately not denied, many years later (1961) Rice’s undergraduate 
alma mater, Oregon State University, awarded him an honorary doctorate in recog- 
nition of his vast body of contributions to the hugely important field of applied 
probability. And thus was Columbia’s double-goof finally put right. 


>I am repeating here a story told (in a memorial tribute to Rice) by David Slepian (1923-2007), a 
well-known Bell Labs mathematician. Slepian admitted he had been unable to conclusively verify 
this tale—but it does ring all-too-true to any who have jumped through the graduate school hoops of 
writing a dissertation. 


Chapter 8 ®) 


Liouville’s Theorem That [ e —*'dx Has No 
Finite Form 


In the 1830s Liouville (see note 3 in the Preface) proved the following theorem: 


If the indefinite integral f f(x)e®dx can be evaluated in terms of a finite number of 
elementary functions then the result will be of the form ae” where P(x) and Q(x) are 


: . : 1 
either co-prime polynomials or are constants. 


I'll not prove this theorem here—you can find an elementary treatment of it in Fitt 
and Hoare (note | in Chap. 7)—rather, we’ll simply take the theorem as true and 
limit ourselves to exploring its implications. There are three elaborations needed at 
this point. 

First, what is an “elementary function”? We’ll use the following definition: “An 
elementary function is one which can be constructed by means of any finite 
combination of the operations addition, subtraction, multiplication, division, raising 
to powers, taking roots, forming trigonometric functions and their inverses, taking 
exponentials and logarithms. In short, no matter how complicated the function, if we 
can write down all its terms, the function is elementary.”* There is, actually, a fair 
amount of ‘wiggle-room’ on what qualifies (or doesn’t) as an elementary function. 
Kasper (note 1), for example, observes that “Newton asserted that a quantity like 
1/(x — a) could not be integrated since to do so would involve a transcendental [that 


"An older but still quite interesting history of this theorem is given by Toni Kasper, “Integration in 
Finite Terms: The Liouville Theory,” Mathematics Magazine, September 1980, pp. 195-201. There 
is, I think, some a priori plausibility to the theorem: if we can actually find some finite A(x) such 
that ff f(x)e®dx = A(x), then it is difficult to see how aA =f (x)ee could possibly be true unless A 
(x), itself, already contains the factor e®™, 


?1’ve taken these words from D. G. Mead, “Integration,” The American Mathematical Monthly, 
February 1961, pp. 152-156. David Godfrey Mead (1922-2016) was professor of mathematics at 
the Pratt Institute (New York City) when he wrote his paper, and he embraced my claim that 
Liouville’s theorem has some inherent “plausibility” to it. He opens his paper with the words “‘it 
[Liouville’s theorem] can be applied by the college freshman” who would find the theorem, even 
without proof, to be “natural.” 
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is, non-algebraic] quantity, the logarithm. For Newton, therefore, the integration 
could not be done since it is impossible to perform in strictly algebraic terms.” 
Second, the polynomials P(x) and Q(x) are assumed to be co-prime. That is, they 
have no common factors. For example, P(x) = x? — 4x? + 5x — 2 and Q(x) = i= 
7x + 12 are co-prime because P(x) = (x — 1)°(x — 2) and Q(x) = (x — 3)(x — 4), but 
if Q(x) = x? —4x+3= (x — 1)(« — 3) then P(x) and this new Q(x) are not co-prime. 


So, when we write on we are assuming that any common factors (if any) have 


already been cancelled to arrive at the reduced forms we call P(x) and Q(x). 
Third, any polynomial N(x) of degree n can always be factored into a product of n 
first-degree factors: 


N(x) = (x — @1)(K — 2)... (K — Gy) 


where the a’s are called the zeros of N(x). (The zeros are the values of x such that N 
(x) = 0.) If a zero is repeated r times in the product form of N(x), that zero (call it «) 
is said to have multiplicity r, and so we can write N(x) as 


N(x) = (x — a)'h(x) (8.1) 


where h(x) is the product of all the remaining factors. Since P(x) and Q(x) are 
co-prime, we see that P(x) and Q(x) will have no zeros in common. If we differen- 
tiate (8.1) we have 


N'(x) =r(x — a)’ 'h(x) + (x —a)"h’(x) 


N(x) =(x —a)"~ '[rh(x) + (x — a)h’(x)]. (8.2) 


Notice, carefully, that the expression in square brackets on the right-hand-side of 
(8.2) may have one or more zeros but none of them can be x = a. Do you see why 
that is so? Think about this until it is clear! Equation (8.2) thus tells us that if N(x) has 
the zero x = a of multiplicity r, then N (x) has the same zero with multiplicity r — 1. 

With all this understood, we are now ready to apply Liouville’s theorem to some 
specific examples. [ll do this in three stages. As a warm-up, [ll show you how 
Liouville-type arguments ‘explain’ a puzzle that every beginning calculus student 
runs into when first encountering the integration formula 


‘ xotl 
re dx = aoe —1 
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which clearly makes no sense forn = —1, while for the n = —1 case we have the 
seemingly vastly different formula 


[ta [ix In(x). (8.3) 


I use the word puzzle because you'll recall Newton’s feelings about integrations 
involving logarithms. What is so special about n = —1? 

Going through the ‘Liouville analysis’ for that first example will illustrate some 
of the kinds of arguments we’ll make when we tackle, for a second example, the 
central point of this book: why the question mark in 


[oma 


And finally, we'll apply Liouville’s theorem to the integrals we arrived at in the 
previous chapter’s discussion of Rice’s integral: 


[xox k=1,2,3,... 


Starting with (8.3), we can establish it by simply differentiating In(x) and showing 
the result is, indeed, 1 . That is, we’ll start by verifying 


Letting y = In (x), which says x = e”, we have 


dy d a ee ae 
dx dx In(x) = & dey ex 
dy dy 


and we are done. This is all very straightforward, but in most students’ minds there is 
still at least a glimmer of puzzlement: why do we seemingly need a ‘new’ function, In 
(x), for the n = —1 case? Couldn’t there be perhaps some rational function of 
polynomials, ie that would work? The following arguments show that this is not 
possible. 


Suppose we assume that we can write 


f2-%8 
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where P(x) and Q(x) are co-prime (share no common zeros), Then, differentiating, 


{ Q2P=P20 GP =P’ 
x Q Q 


or, 


Q’ =x(QP’ — PQ’). (8.4) 


This tells us that Q? = 0 at x = 0. That is, Q has a zero at x = 0. That tells us that P 
(0) # 0 since Q and P are co-prime and so do not have any common zeros. Now, 
suppose the x = 0 zero of Q has multiplicity n, and so we can write Q(x) = x"R(x), 
where the polynomial R(x) is such that R(O) # 0. Putting this into (8.4), we have 


xR? =x [x"RP’ — P(nx"~ cee x"R’)| 
or, with some rearrangement, 
x" [x"R? — xRP’ + nPR + xR’P] =0. (8.5) 


Equation (8.5) is supposed to be true for all x (it is certainly true for x = 0), and for 
that to be the case for x # 0 we see that the expression in the square brackets must be 
zero for all x # 0. That is, 


x"R? — xRP’ + nPR + xR’/P=0 (8.6) 


for all x # 0. Now, imagine that we make x arbitrarily close (but not equal) to zero, 
which means we can make the first two terms of (8.6), and the fourth term of (8.6), as 
small as we wish. This says the third term, nPR, must also be approaching zero as 
x — 0. (Remember, P(x) and R(x) are polynomials of finite degree, and that means P 
(0) and R(O), as well as P(0) and R (0), are finite constants.) That is, R(O)P(O) = 0, 
which means either that R(O) = 0, or that P(O) = 0, or that both R(O) and P 
(O) are zero. 

But none of these conditions is possible, as R(O) = 0 violates our earlier 
conclusion that R(O) # 0 and P(O)) = 0 violates the co-prime assumption that P 
and Q have no common zeros (and we earlier established that x = 0 is a zero of Q). 
So, we are forced to conclude that there simply are no polynomials P(x) and Q 
(x) such that 
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Turning our attention now to the case of 


[ofa 


we assume by Liouville’s theorem that we can write 


poma- Om? (8.7) 


or, 
Qe-" =Q [P(- 2xe~*’) af aed ~Pe-* 


Cancelling the e~ x” factor in each term (which we can do because e~* #0 for any 
Xx), we get 


Q? = — 2xQP + QP’ — PQ’. 
With a little rearrangement we have 
PQ’ = — Q* — 2xQP + QP’ 
or, 
PQ’ =Q|P’ — 2xP — Qj. (8.8) 


Now, suppose Q is a polynomial of some degree greater than zero. That assumption 
means Q has a zero « of some multiplicity k > 1, which in turn means Q' has the 
same zero but with multiplicity k — 1. Since P(a) # 0 because P and Q are co-prime 
and so do not have any common zeros, this all says the left-hand-side of (8.8), PQ’, 
has the zero « with multiplicity k — 1. 

Turn your attention now to the right-hand-side of (8.8). The factor of Q there says 
the right-hand-side has the zero, a, with multiplicity of at least k (the at least is 
because there could potentially be more zeros at x = a in the square-bracket term 
P — 2xP — Q). With this we see that we have an impossible situation: Eq. (8.8) with 
its two sides having the same zero but with different multiplicities. What caused this 
conflict? The culprit is the assumption that started this whole line of reasoning, 
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namely, that Q is a polynomial that has a zero. That assumption must therefore be 
wrong, and so we change our tune and now argue that Q has no zero, that is, Q is not 
a polynomial but rather is a non-zero constant. 

With that correction we then have, of course, Q = 0 and so (8.8) reduces to P= 
2xP —Q=0Oor 


P’ —2xP=Q. (8.9) 


Now, suppose that P is a polynomial of degree n (supposing P is a constant 
immediately leads to a contradiction—do you see it?) Then P isa polynomial of 
degree n — 1, while 2xP is a polynomial of degree n + 1. That immediately tells us 
that (8.9) is impossible, as it is not possible for the difference of two polynomials of 
different degree to be, for all x, a constant (that is equal to Q). We therefore are 
forced to conclude that there simply are no P(x) and Q(x) that satisfy (8.7). 

Finally, we turn our attention to the integrals we found in the previous chapter in 
the discussion of Rice’s integral. As before, we assume we can write 


_y2 P(x) _y2 
xke-* dx = e-* k=1,2,3,.... 8.10 
/ Q(x) oo 


For the k = | case we see that this is possible, as 


[xe Pax= — fe 


and so P= — 5Q, with Q any positive constant, works. Now, what about the k > | 
integrals? 
Differentiating (8.10), we start with 


, Qe (Pe) —Pe-™ 


xke* 


which, with just a bit of algebra, becomes 
PQ’ =Q[P — 2xP — Qx*]. (8.11) 


To see what (8.11) tells us, I’1l now repeat much of the same line of reasoning used in 
the last example, but perhaps that’s not a bad idea (and, eventually, we do encounter 
just a slight twist at the end). 

So, let’s assume Q is a polynomial of some degree greater than zero. That means 
Q has a zero at x = o of some multiplicity m > 1 which means Q has the same zero 
with multiplicity m — 1. Since P(«) # 0 (be sure you understand why), the left-hand- 
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side of (8.11) has the same zero with multiplicity m — 1, while the right-hand-side 
has the same zero with a multiplicity of at least m. Contradiction! Thus, as before, 
we adjust and now assume Q is a constant. Thus, Q = 0 and so (8.11) reduces to 


P’ — 2xP — Qx‘ =0,k=2,3,4,.... 
That is, we have 
P’ =2xP + Ox*,k=2, 3,4, .... (8.12) 
Now, look at that 2xP term on the right-hand-side of (8.12). It obviously has degree 
greater than that of P. But that means (8.12) is then claiming P’ has degree greater 
than that of P, which is absurd. 


We conclude, therefore, that the assumption Q is a constant also fails, and so there 
simply are no P(x) and Q(x) that satisfy (8.10) for k > 2. Done. 


Chapter 9 ®) 
How the Error Function Appeared ce 
in the Electrical Response 

of the Trans-Atlantic Telegraph Cable 


In this last chapter of our interlude, I'll show you a specific, detailed example of what 
Glaisher meant (in his epigraph that opens this book) when he wrote that te e-"'du 
is important to students of the physical world, in addition to being of great interest to 
pure mathematicians. This chapter has been written so as not to make any demands 
on you having prior knowledge of physics, other than what everybody comes to 
know as ‘common knowledge’ by age 10. So... 

Imagine you have a glass of clear, still water on your desk, and that you slowly, so 
as not to disturb the water, submerge the tip of a syringe needle into the water. The 
syringe is filled with bright red ink, and you inject a single drop of it into the water at 
time t = 0. Initially, you see just a tiny red dot in the water but, as time increases, the 
red dot expands into what appears to be a spherically growing pink cloud. After a 
long time, the entire glass of water appears uniformly pink. What has happened, of 
course, is that the ink drop has diffused into the water as a result of the ceaseless 
bombardment of the ink molecules by water molecules. This is an example of three- 
dimensional diffusion. 

Now, imagine that you have a very long, thin electrical cable (we'll actually take 
it to be infinitely long, with just one end, starting at x = 0) with no electrical energy in 
it. Electrical energy is related to the voltage of the cable, and so initially the voltage 
at any point along the cable, at distance x from the x = 0 end, is v(x, 0) = 0. Then, 
suddenly, at time t = 0, you connect the x = 0 end to a battery. If we assume 
electricity (electric charge), like ink, diffuses' into and along the length of the cable, 


' This is not the correct physics for a cable, but is only approximately valid under quite specialized 
conditions. Historically, however, this was the physics considered by William Thomson in his 
enormously influential paper “On the Theory of the Electric Telegraph,” Proceedings of the Royal 
Society, May 1855, pp. 382-399. For his contributions to the Atlantic Cable Project Thomson was 
knighted by Queen Victoria in 1866 (becoming Sir William), and later (1892) was elevated to the 
peerage to become Baron Kelvin (see Fig. 1.6 again). Lord Kelvin was one of the giants in the world 
of physics, and at the end of his life he was buried in Westminster Abby (an honor reserved by 
England for her greatest heroes). 


© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 81 
P. J. Nahin, The Probability Integral, https://doi.org/10.1007/978-3-03 1-38416-5_9 


82 9 How the Error Function Appeared in the Electrical Response of. . . 


the obvious mathematical question is “what is the voltage at an (arbitrary) point 
distance x from the battery, at any time t > 0 ?’ That is, what is v(x, t) for t > 0 ? This 
is an example of one-dimensional diffusion, and was one of the questions considered 
by William Thomson when he explored the mathematical physics of the nineteenth 
century trans-Atlantic telegraph cable. 

That is, Thomson’s problem was to solve (for appropriate initial and boundary 
conditions to be discussed soon) the one-dimensional diffusion equation” 


pies ee (9.1) 


where k is the so-called diffusion constant, a constant determined by the electrical 
parameters of the materials used to construct the cable. The differential equation of 
diffusion involves partial derivatives because v(x, t) is a function of multiple inde- 
pendent variables (x and t). The dimensions of k are easy to determine. On the left- 
hand-side of (9.1) the dimensions of me are volts/second. On the right-hand-side the 


dimensions of Oe are volts/ distance”. Since the dimensions of the two sides of (9.1) 
must be the same, the dimensions of k are distance”/second. I’ll remind you of this 
when we get to the end of the analysis. 

The mathematics involved in solving (9.1) for v(x, t) that we’ll use here* (the 
math used by Thomson) is not normally encountered by a modern student until a first 
course in partial differential equations, but that’s not actually an indication of 
advanced difficulty. Some of the arguments you’ll see here will, indeed, be novel 
to a student who has just finished high school AP-calculus (or even the freshman 
year of college), but they are nonetheless completely accessible to such a student. 
Those arguments may be new to you, but they are not difficult to absorb. 

We start with an idea that can be traced back to 1753 (the year before De Moivre’s 
death), to the Swiss mathematician Daniel Bernoulli (1700-1782). We’ll assume 
that the solution to (9.1) can be written in the form of 


v(x, t) = X(x)T(t). (9.2) 


That is, we'll assume we can separate the independent variables x and t into the 
product of two functions, X(x) and T(t). X(x) has no t-dependence, and T(t) has no 
x-dependence. Why can we assume this? Well, we can assume anything we want in 
mathematics—the proof-in-the-pudding is if the assumption leads to a solution that 
works! Our product assumption will pass that pragmatic test. 

Substituting (9.2) into (9.1), we get 


?Pll not derive (9.1) here, but if you’re curious you can find a derivation in my book Hot Molecules, 
Cold Electrons, Princeton 2020, pp. 57-63. 

Today an analyst would almost certainly use the modern approach of the Laplace transform, but 
that mathematics wasn’t available in Thomson’s day. For the transform solution, see (for example) 
my book Transients for Electrical Engineers, Springer 2019, pp. 137-141. 
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1 dT 1 @x 
Wax ag: (9.3) 


The left-hand-side is a function of only t with no x-dependency, and the right-hand- 
side is a function of only x with no t-dependency, and this can be true for all x and all 
t only if both sides are equal to the same constant. I'll write that constant as —)7, 


where A is arbitrary (writing the constant as —A? forces the constant to be negative). 
Setting both sides of (9.3) equal to —*, we then have 


qT yp 

at’ T=0 (9.4) 
and 

dx 12x = 


The general solutions to (9.4) and (9.5) are easily verified to be (by direct 
substitution) 


X(x) =C; cos(Ax) + C2 sin(Ax) 
and 
T(t) =C3e~** 
where C,, Cs, and C3 are arbitrary constants. Thus, 
v(x, t) =C3e~* "[Cycos(Ax) + Cy sin(Ax)] 


or, combining constants in the obvious way to arrive at the constants A and B, we 
have 


v(x, t) =Ae~** cos(Ax) + Be ~*™ sin(Ax). (9.6) 
Now, before going any further, we need to specify both the boundary and the initial 
conditions we are going to impose on v(x, t). We are going to solve for v(x, t) when a 
unit battery voltage is suddenly applied at the x = 0 end of the cable (think of a 


telegraph key connecting a one-volt battery to the x = 0 end of the cable). That is, 


v(0,t)=1, t>0. (9.7) 
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Further, we’ll suppose the cable is initially without any electrical charge in it, that is, 
v(x,0)=0, x>0. (9.8) 


Alas, if we try to use these two conditions on (9.6), in an attempt to learn more 
about A, B, and 4, we immediately run into difficulties (give it a try). 

To get around this difficulty (A and B come out as variables, not as constants), 
mathematicians have developed a clever trick. What we’ll do is solve the diffusion 
equation using a different choice of boundary and initial conditions, and then Pll 
show you the relationship this solution has with the solution to the problem we are 
actually interested in. So, instead of (9.7) and (9.8), let’s imagine a cable that has 
been connected, at x = 0, to a unit voltage for a very long time (so long, in fact, that 
the cable is fully charged), and then at t = 0 we suddenly connect the x = 0 end to 
what electrical engineers call ground (to zero volts). Thus, 


v(0, t) =0,t>0 (9.9) 
and 
v(x,0)=1,x>0 (9.10) 


Keep in mind that (9.6) is still valid, but is now constrained by the boundary 
condition of (9.9) and the initial condition of (9.10). 
When we apply (9.9) to (9.6) we get 


Ae~**—9 
which, for this to be true for all t, immediately tells us that A = 0. Thus, 
v(x, t) =Be~* sin(Ax). (9.11) 


Since i is arbitrary, then (9.11) holds for all possible choices for Xd (since 2’ is 
squared, this means we can assume 0 < A < oo since using a negative value for 
adds nothing new). Thus, since the sum of two solutions to the diffusion equation is 
also a solution, then if we add terms like (9.11) for all possible i, that is, if we 
integrate over all non-negative 4, we will have the most general solution. Further, for 
each choice of A, B itself could be a different constant. (The word constant simply 
means A and B do not depend on either x or t.) That is, B = B(A). So, the most general 
solution is 


v(x, t) = | “Be sin(Ax)dA. (9.12) 
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We can find B(A) by applying (9.10), which results in 
v(x, 0) =1 -{ B(A) sin(Ax) da. (9.13) 
0 


The question now is, how do we solve (9.13) for B(A), which is inside an integral? To 
answer this, using just the mathematics of Thomson’s day, let’s take a temporary 
break from the diffusion equation and indulge in a little digression into Fourier 
series (see Appendix G). Imagine that f(x) is some (any) periodic function with 
period T, that is, f(x) = f(x + T). Then 


f(x) = ao + y {an cos(n@ox) + by sin(n@ox) } 


n=1 


where what is called the fundamental frequency is given by 


The so-called Fourier coefficients are given by 


> pt 
a, = z/ f(x) cos(n@ox)dx,n=0,1,2,3, ... 


TJ _tp 
and 
> pt 
b, = z/ f(x)sin(n@px)dx,n=1,2,3,.... 
TYJ_tp 


I’m not going to prove any of this (look in any good math book on Fourier 
series“), and will simply ask you to accept that the mathematicians have, indeed, 
established these statements. 

Now, let’s write T = 2L, and so the Fourier coefficients become 


> i nmx 
a, = xf feoos (ax. n=0,1,2,3, oe 


and 


if _ (NX 
b=z f £(x)sin("P*)dx,n = 1, 2,3, ee 


4 An excellent choice is Georgi Tolstov, Fourier Series, Dover 1976. Tolstov (1911-1981) was a 
well-known mathematician at Moscow State University, and the author of numerous acclaimed 
math books. And, again, see Appendix G. 
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because ®oT = 2x says that, with T = 2L, we have 


Oo = 


cs 


The Fourier series for f(x) is, then, 


x)= 340+ > {an 00s("F ) + bn sin() }. 


Inserting our expressions for a, and b,, we have (with u as a dummy variable of 
integration), 


f(x) = ao u)du + ae cos(= *) Lf tw COs (=) du 
+570, sin (=) Lf tw sin (FF) du 


or, 


f(x) = x tf fe u)du + vi Lf sw { cos( =) cos (™) - sin (=) sin (=) \au. 


If you now recall the identity 


cos(«) cos(B) + sin(«) sin(B) = cos(a — B), 


then with 
q = PRU p _ NAX 
me 0 ahaa 
we have 
f(x)= = ° f(u)du + si 2 ° f(u) cos{7™(u—x) bdu (9.14) 
2L —L n=1 L -L L , , 


Now, let L — oo, which means we have a periodic function whose period is the 
entire x-axis! In other words, f(x) is now any function we wish. 

What happens on the right-hand-side of (9.14) as L > 00 ? The first thing we can 
say is, if f(x) is an integrable function (the only kind that interested Thomson who 
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was studying a real, physically constructable system), then it bounds finite area and 
so 


Next, define 
1 
aa § 
and then write A, = A, Az = 2X an A3 = 3A an, .-., An =nA= 7, ... and so on, 


forever, as n — oo. If we write 
r 
AAn =)at1 _ An =F: 


we have 


and so (9.14) becomes (where I’ve dropped the first integral because we’ve agreed 
that it vanishes in the limit L — oo) 


y= 55 ae i u) cos{A,(u — x) }du. 


n=1 


As L— ~ we see that AX, — 0, that is, AA, becomes ever smaller (ever more like 
the differential dh), , becomes the continuous variable i, and the sum becomes an 
integral with respect to 4. (Mathematicians may cringe just a bit at this sort of what 
might be called ‘loose talk,’ but engineers and physicists will at least give it a 
chance.) Since the definition of > allows us to ignore negative values, we thus 
write the L — oo limit as 


f(x) = =f if f(u) cos{A(u — x) }du} dar 


1 


at li . | / ” £(u){ cos(n) cos(hx) + sin(tu) sin(ax)}du da 


— co 


where I’ve again used the identity cos(«) cos (8) + sin (a) sin (B) = cos (a — B).So, 
if we change notation and write v(x, 0) instead of f(x), just to make things look as we 
left (9.13) when we started this digression, we can write 
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v(x, 0) = if cos(tx) f v(u, 0) cos(tu be 


— co 


+i fe sings) { | vu 0) sin(iu)a be, 


We know v(x, 0) = 1 for x > 0, while what v(x, 0) is doing for x < 0 has no physical 
significance (the cable doesn’t exist for x < 0). That means we can feel free to 
specify v(x, 0) for x < 0 in any way we wish that’s convenient. In particular, suppose 
we define v(x,0) = — 1 for x < 0, that is, v(x, 0) is an odd function of x. Since cos 
(Au) is an even function of u, and since sin(Au) is an odd function of u, and since 
(by our recent definition) v(u, 0) is an odd function of u, then 


(9.15) 


[ v(u, 0) cos(Au)du = 0 


— oo 


and 


/. v(u, 0) sin(Au)du = 2f vu 0) sin(Au)du. 


— oo 


Thus, (9.15) becomes 


iy. sin) {= [vu 0) sin(iuu fe 


or, as v(u, 0) = | in the inner integral as the dummy variable u varies from 0 to oo, 


we have 
2 f, : 
i= [ Ef sn(aaau | sin(Ax)da. (9.16) 
0 (TSo 


Now, if you haven’t noticed it yet, we have just found B(A) ! To see this, compare 
(9.16) with (9.13), and now you see it, don’t you?: 


B(A) = 2 f° sin(Au)du. 


1 


Inserting B(A) into (9.12), we have 


vex = f Gf sin(u)da be sin(ax)e 


or, reversing the order of integration, 
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nee eee i: “{ i “ sin(Au) sin@xje ep. ba, (9.17) 


1 


If you recall the identity 


sin(a) sin(B) = F feos(a B) — cos(a + B)], 


then (9.17) becomes 


v(x, t) = fat cos{A(u ~xyje en bay 


= fae cos{A(u + sje ea. ba, 


The inner integrals of (9.18) can be found by the simple expedient of using a good 
math table: 


(9.18) 


ia 2 1 /m__ (xy 
Mktqy = (u — x)" /4kt 
| cos{A(u — x)}e dx 5 Ve ae 


and 


| cos{A(u + x)} ew tga, n= 3 fhe (u+x)? /4kt 
0 2\k 


which says (9.18) becomes 


1 1 *° (a —x)? /akt °° (utx)?/4kt 
v(x, t) = .|f e Uh au f e du}. (9.19) 
2 Vxkt 0 


on 


Next, change variable in the two integrals of (9.19) to 


>We actually derived this integral (originally due to Cauchy) in Chap. 5 (as a generalization of the 
probability integral) where it is shown that [,“e e7* cos(2ax)dx = 5 /me~ ® If you make the 
obvious change of variables then the stated integral formula follows. 
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_ Uutx 


ye Vet 


where we use the minus sign in the first integral, and the plus sign in the second 
integral. Then, 


VE 2 
e * dy 
Ya 


VO.) = 5 : Le avitdy~ e  2Vkt dy =3/° 


avkt 2vkt 


y 


ia 6 
or, ase” is even about y = 0, 


v(x, t) = 3, [oP ey er (a). (9.20) 


Now, remember that the v(x, t) in (9.20) is not the solution to the problem we are 
actually trying to solve. The solution in (9.20) satisfies the conditions given in (9.9) 
and (9.10), while what we want is the solution that satisfies the conditions of (9.7) 
and (9.8). But that’s easy to arrange—just subtract the v(x, t) of (9.20) from one! 
That is, our final answer is 


v(x, t) =1—erf (=): (9.21) 


This works because | is a (trivial’) solution to the diffusion equation, and the 
difference of two solutions is also a solution. 

The argument of the error function is (if you recall the opening discussion on the 
dimensions of k) dimensionless, which is just as it should be.* For the Atlantic cables 
(there was more than one) a typical value of k was 6 x 10° (with length measured in 
nautical miles and time in seconds). Figure 9.1 shows a plot of Eq. (9.21) for the 
value of x = 2000 nautical miles, a typical length of an Atlantic cable. As you can 
see, it takes an appreciable time for the application of a one-volt signal at the x = 0 


Equation (9.20) soon follows from the definition of the error function: erf(x) = Fe fe ey dy, 
which has the feature erf(oo) = 1. For that reason, erf(x) is sometimes called the incomplete 
probability integral. In a 1924 note to Nature (October 25, p. 610) the English astronomer Henry 
Plummer (1875-1946) showed that an approximation for erf(x), over the interval 0<x< V3 


(an interval that applies in some practical applications), is (5) aye ALx= /3, this approximates 


erf (V3) with V3 = 0.977203 ..., which compares roughly with the actual value of 0.985694... . 


"Trivial, because it reduces the diffusion equation to the claim that 0 = 0 which is undeniably true. 
8Do you see why? Think carefully about this (but as a hint, suppose you expand the error function 
as a power-series—if the argument did have dimensions, what would that mean?) In other words, 
would the power series make physical sense? 
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Fig. 9.1 The voltage in an initially uncharged Atlantic Cable at x = 2000 nautical miles when a 
one-volt battery is suddenly attached to the end at x = 0, as a function of time 


transmitting end of the cable to result in an appreciable voltage at the x = 2000 mile 
receiving end (it takes about 7 seconds to see a half-volt signal at the receiving end).” 
It was the estimation of this time delay that showed potential investors in the Atlantic 
Cable Project what signaling speeds could be reasonably expected, which in turn 
influenced the estimation of the financial rewards such a project (a truly science 
fiction project in the nineteenth century!°) might generate. 


9.1 Historical Note 


To create Fig. 9.1 for this book, I simply used a built-in function in MATLAB to 
generate values of the error function. Before the development of electronic com- 
puters and their elegant software packages, however, analysts used pre-calculated 


°Do you wonder how what we’ve assumed to be an infinitely long cable can have a second end at 
x = 2000 nautical miles? By properly terminating a cable of finite length, electrical engineers 
learned how to make a cable of finite-length electrically ‘look’ to be infinitely long—but that’s 
really getting pretty far away from the probability integral and so [ll just leave matters at that. 
'©To understand why I call the creation of the cable ‘science fiction,’ consider this. When the 
American President Abraham Lincoln was assassinated in 1865, it took nearly two weeks for the 
news to travel to London by ship. Just a year later, with the 1866 cable open for business, that same 
message could ‘make the trip’ in just a few minutes. The success of that cable received world-wide 
acclaim (and won Thomson a knighthood). 
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tables (which are still useful today for simple, straightforward hand calculations that 
are small in scale). A ‘typical’ modern handbook of math formulas and tables will 
include a table giving the value of 


®(x)= al edt 


for the values of x from 0 to 3, in steps of 0.01 (as I type this, Iam looking at my copy 
of Schaum’s math tables). Clearly, ®(0) = 5, and the value of ®(x) for x < 0 is easily 
calculated from the tabulated values since the integrand is symmetrical about x = 0. 
From these values of ®(x) we can then get values for erf(x) with the relation 
erf(x) =2® (xv2) — 1 (vou should confirm this!). 

The first table of ®(x) was prepared by the French mathematician Chrétien 
Kramp (1760-1826), as part of a 1799 book.'' A little more than a century later, a 
much more extensive table was published by the Australian-British statistician 
William Fleetwood Sheppard (1863—1936).'” 


11 Mason Du Pré, Jr., “The First Table of the Normal Probability Integral,” Jsis, July 1938, 
pp. 43-48. Kramp was led to constructing his table while studying the refraction of light, an interest 
motivated by earlier (1754) work by Euler. 

'2w. F. Sheppard, “New Tables of the Probability Integral,” Biometrika, February 1903, 
pp. 174-190. 


Part III 
Evaluating the Probability Integral 
(More Proofs) 


Chapter 10 ®) 
Doing the Probability Integral orsst s 
with Differentiation 


With our interlude over, in this chapter we return to the calculation of the probability 
integral and I'll show you how to derive it using—seemingly paradoxically, 
perhaps—the idea of differentiation. The basic idea is this: if the integrand of a 
definite integral has a parameter that we can differentiate with respect to (like the a in 
the integral Cauchy did in 1815, discussed in Chap. 5), then the derivative of the 
integral with respect to the parameter is the integral of the derivative (see Appendix 
E for the details of what is called Leibniz’s formula). Let’s begin by seeing how that 
works with Cauchy’s integral 


I(a) = ee cos(2ax)dx. (10.1) 


Leibniz’s formula says! 


'We are dealing here with an improper integral (upper limit is infinity), and some may object to 
applying Leibniz’s rule to such an integral (what does it mean to say foo = 0?) While acknowl- 
edging such objections, I do it anyway for the first half of this chapter, but for additional 
commentary on the issue (and for one complicated way to avoid doing it) see the paper by Robert 
Weinstock cited in note 8 of Chap. 6. The late Professor Weinstock (1925—2006) was on the physics 
faculty at Oberlin College (Ohio) when he wrote his paper, and he wasn’t being pedantic. Rather, he 
was trying to develop a derivation suitable for (elementary) “classroom presentations.” Weinstock 
seems, however, to have been accepting of doing what I do here, as long as we admit that it is (as he 
writes) “ad hoc” (that’s Latin for ‘I’m doing this only because otherwise I’m stuck’). In the second 
half of this chapter, I'll show you how a math colleague of Weinstock’s at Oberlin avoided 
differentiating an improper integral. 


© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 95 
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oo 
-{ e * 2x sin(2ax)dx = —G (10.2) 
0 
where 
cs 2 
c= | e “2x sin(2ax)dx. (10.3) 


0 


Integrating (10.3) by-parts— {,°u dv =uv|> — fo vdu, with dv= e-* 2x and 


u = sin (2ax) and sov= —e * 


~ and du = 2a cos (2ax)dx—we have 
G=-e™* sin(2ax)| + 2a f e~* cos(2ax)dx 
0 0 
or 


G= 2a | e~* cos(2ax)dx = 2al(a). (10.4) 
0 


Thus, putting (10.4) into (10.2), we have 


dl 
or, separating variables, 
= —2ada 


and so, integrating indefinitely (with C an arbitrary constant) we have 
In(I) = —a? + In(C) 


or 


Remembering that 
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we see (as Cauchy derived, using the shaky ‘generality of algebra argument men- 
tioned back in Chap. 5, an argument he eventually abandoned)” 


I(a) =| e7* cos(2ax)dx = SVR. (10.5) 
0 


In Chap. 5 I mentioned Laplace’s different generalization of the probability integral 
(in that chapter’s Eq. (10.11) set a = 1): 


I(b) =| e* dx = i Vm >*,b>0. 
0 


This is easy to establish with Leibniz’s formula, as follows. Differentiating, 


dl Pat) abl 1 _yib 
SS x2 = x2 
db | e (<)ax i me dx. 


Now, change variable to x = vb and so oe = vp . Since t= vb then oe = vb or, 
x2 
— 2 
dx = J dt. 
Thus, 


or 
dl 1 _b_e 1 
—— e & ~ dt= — ——I(b). 
ab Veh iy 
So, 
dl 1 
ae ap 
TV 
or, 


> This is a beautifully elegant derivation, but I can’t resist telling you that mathematicians often can’t 
resist being entranced by new, even more complicated proofs (don’t ask me why). See, for example, 
the derivation of Cauchy’s integral, using functional equation theory (which is about twenty times 
more involved than is what I’ve just shown you), as presented in Hiroshi Haruki, “On a Well- 
Known Improper Integral,” The American Mathematical Monthly, August-September 1967, 
pp. 846-848. Professor Haruki (died 1997) was a Japanese mathematician who, as an expert in 
the theory of functional equations, was (I think) simply showing his pride in how his area of 
expertise could handle Cauchy’s integral, even though there are (far) easier ways to do it. 
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In{I(b)} = —2v’b + arbitrary constant. 


Thus, 


where 


and therefore 
I(b) =10}e" 2-4 | e-dx= Wie 
0 


as claimed. 

Okay, these two examples of Leibniz’s formula show how to use the known value 
of the probability integral to find generalizations of that integral. The big question, 
though, of how to use Leibniz’s formula to derive the probability integral itself, in 
the first place, still remains. Here are two quite different ways to do that with the aid 
of Leibniz’s formula, first with a parameter in the integrand alone, and second using 
a parameter that is not only in the integrand but also in one of the integration limits. 
This second approach has the attractive feature of avoiding the objection I mentioned 
earlier, of differentiating an improper integral. 

We start our first method with the integral (with parameter t) 


I(t) =f ee (10.6) 


and we see that I(oo) is the probability integral, the integral we are after. (From 
where, you ask, did (10.6) come from? Well, that’s the trick—or brilliance—part of 
using Leibniz’s formula!) In any case, if we make the change of variable x = ut 
(dx = t du) ) then (10.6) becomes 


00 ewe 
I(t)=t — xd 
(t) | 14+v " 


a . 
, gives 


; , fe wet oo , —t?(1+u") 
e*yy=te* | du=t/ 5—du 
9 itu Jo itu 


which, if we multiply through by e~ 


and so 
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Differentiating (10.7) with respect to t, we have 


-? oo , — (1+?) oo 2)_,-C(1+u’) 
d (: a) al e au= | 2t(1 + u’)e du 
0 0 


~ dt 1+w 1+wv 


or. 


d e* I(t) _ ae = — P2y2 
z( ; = —2e [« du. (10.8) 


Next, make the change of variable x = ut again and watch the integral on the right- 
hand-side of (10.8) become 


= 24,2 i 2 dx ve 2 
i; edu f te~* ef e * dx 
0 0 t 0 


or since, as you'll recall from (10.6), 


then (10.8) becomes 


< (2) Hs Oe Te: (10.9) 


Now, integrate each side of (10.9) from 0 to oo to arrive at, for the left-hand-side, 


Pal®)a-G*) 


The first limit is obviously zero, while the second limit is easily found by looking 
back at (10.7): 


-? oo —?(1+u’) oo a 
: I(t . = 
fin ai Se du = tan '(u) ==, 
0 9 l+u 


t0 t>0 14+? 
0 


Co 


—t? —t 
je Oe 


too t0 t 


0 


That is, 
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[ ‘ (2 )a= - (10.10) 


To finish, we next integrate the right-hand-side of (10.9): 


i —2e~"I(oo)dt = = 21(00) f e~'dt= —2P(00). (10.11) 
0 0 
Setting (10.10) equal to (10.11), we have 


— 29° (c0) = — 


Nila 


or, at last, 


and we are done. 
For a second way to use Leibniz’s formula to derive the probability integral, 
define the function* 


x 2 1 -x?(?41) 
= — e 
F(x) = tf e ar} +f eat dt. (10.12) 


Before continuing I should tell you that Professor Young implicitly acknowledges 
the non-obvious nature of F(x)—who would ever think of writing such an 
expression?—when he calls (10.12) “extraordinary!” But let’s not be ungrateful 
(why look a gift horse in the mouth!?), and so we happily accept (10.12) and 
promptly differentiate it with respect to the parameter t to get, from Leibniz’s 
formula, that 


>This analysis is based on a paper by Robert M. Young, “On the Evaluation of Certain Improper 
Integrals,” The Mathematical Gazette, March 1991, pp. 88-89. When Young wrote his paper he 
was (and still is, as I write) professor of mathematics at Oberlin College, and was a colleague of 
physicist Robert Weinstock (see note 1). The papers by Professors Young and Weinstock both use 
as their starting point the analysis published years earlier (1979) by J. van Yzeren, whose paper I 
quoted from in note 1 of Chap. 3. Professor Weinstock cites van Yzeren, but ProfessorYoung says 
of the approach only that “it deserves to be far better known than it is.” That is certainly true. The 
originator of what I am about to show you was the Dutch mathematician Jan van Yzeren 
(1914-1998), who was a member of the mathematics department at the Technological University 
(Eindhoven, The Netherlands). Originally trained as a lawyer, he didn’t receive his doctorate in 
mathematics until age 73, years after writing his paper. 
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x 1 2 -(?+1) , eS 1 
7-3 [ etahe + | axit oe ane" f earl —2x f en (P+) at 
dx Jo Jo t+ 0 Jo 
2 os 2 | 242 2 2 es 2 2 | 242 
= 2e—* {[ e-Fath —2x f ee tataae ty fetal ane [eee 
0 0 Jo 0 


In the right-most integral, change variable to u = xt (and so dt= lu ). Since u = 0 
when t = 0, and u = x whent = 1, we have 


1 x x 
phi? _,2 du 1 = 
pettan fee Sat | eta 
0 0 x X Jo 


Thus, 
w=2*{ | earl —2xe-# 3 eVdu=2e-*( [ e fa fe au) 
dx 0 X Jo 0 0 
or, 
dF 
= =. (10.13) 


This almost certainly surprising result, that the derivative of F(x) vanishes for all x, 
tells us that F(x) is actually not a function varying with x, but rather is a constant. To 
determine this constant is easy to do. Since the value of F(x) doesn’t change with x, 
we can evaluate F(x) for any particularly convenient value of x. So, let’s plug x = 0 
into (10.12) to arrive at 


2 


1 
0 5 1 0 1 
F(0) = feva i = a= | ze = tan ~'(t) 
F ) e+ ) e+ 


0 


_t 


mi 


= tan~'(1)— tan ~'(0) 


That is, 


ea 2 1 =x (tH) 
~ "dt —_,——— dt = — 10.14 
{f° } +f e+1 4’ ao 


and this is so, as I said earlier, for any value of x. In particular, it remains true as x 
becomes arbitrarily large. Since the second integral on the left-hand-side of (10.14) 
obviously goes to zero as x — oo, we have 


ore : 2 _ 
Far} == 
tf 


and so, just like that, we have 
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 e-Pat— fee! 
[fe dt= q= Ve 


An attractive feature of this method is that it is easily adapted to deriving the Fresnel 
integrals (see the second half of Chap. 5). Here’s how that works. Instead of working 
with the F(x) in (10.12), we’ll define its complex version, G(x), as 


x 2 1 i? (P41) 
aw) ={ [ eat +if <——dt, i= V—1. (10.15) 
0 9 ttl 


Proceeding as before, you should be able to confirm that ac = 0, and so conclude that 
G(x) is actually a constant with the same value for any choice of x. In particular, if 
we set x = 0 then (10.15) reduces to 


1 
G(o) =i / a =i tan ~'(t) 


Next, letting x — oo, (10.15) becomes 


a 2 1 gix?(? +1) i 
G =G(0) = e" dt> +i lim ——— dt=i-. 
(oe) =G0)=4 foePath itim [Sy a7 


We also observe that 


“1 ix?(?+1) 
lim [| <——at=0, (10.16) 
x00 fy t +1 


but the reason for (10.16) is not quite as obvious as it is for why the similar appearing 
integral in (10.14) vanished as x — oo (for some reason, Professor Young elected to 
skip over this point in his paper).* These results immediately lead (I'll let you fill-in 
the easy details) to 


“As a ‘hand-wavy’ hint as to why the integral in (10.16) vanishes as x — 00, notice that (from 
Euler’s identity) the numerator of the integral (as x — oo) becomes an arbitrarily-high-frequency 
oscillatory function that contributes equal amounts of positive and negative areas to the value of the 
integral, areas that mutually cancel. This is a particular illustration of what mathematicians formally 
call the Riemann-Lebesgue lemma: 


iim i f(t) cos(mt)dt= lim. fPt(t) sin(mt)dt=0 for any absolutely integrable f(t). Certainly 


f(t) =; ta is in that category. You can find a formal proof of the lemma on pages 70-71 of 
Tolstov’s book (see note 4 in Chap. 9). 
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1 2 D, . 
iZ= A* — BY + i2AB 
where 
lo.@) lo.@) 
A= | cos (t”) dt, B= | sin (t”) dt. 
0 0 


This all looks very much like what Professor Flanders arrived at back in Chap. 6 
when he derived the Fresnel integrals using Poisson’s trick of changing co-ordinate 
systems (but don’t forget, Flanders’ results—Egs. (10.10) and (10.11) in Chap. 6— 
are much more general, with the Fresnel integrals appearing as special cases). 


Chapter 11 ®) 
The Probability Integral as a Volume erste 


This chapter will demonstrate how introducing some physical considerations can 
result in an absolutely tremendous reduction in the complexity to the solution of a 
mathematical problem. The physical consideration we’ll use here is the interpreta- 
tion of a double integral as the volume enclosed by a surface of revolution in three- 
dimensional space. This is actually a very old idea in mathematics, and was the basis 
for how the third century BC Greek mathematician Archimedes of Syracuse was 
able to determine the volume formula for a sphere hundreds of years before Christ 
(that is, long before the invention of the integral calculus).' The idea of interpretating 
the probability integral as a volume seems to make a repeated, regular appearance in 
the mathematical literature, but the earliest version that I know of appeared in 1950, 
as a contribution to the “Classroom Notes” section of The American Mathematical 
Monthly.’ The presentation by Professors Nicholas and Yates is a model of simplic- 
ity and elegance. They argue that it “may be presented at the sophomore level,” but I 
think that a too cautious assertion. 


‘Ror how Archimedes did it without calculus, see (for example) the discussion in William 
Dunham’s book Journey Through Genius, John Wiley 1990, pp. 99-103, and the paper by Tom 
Apostol and Mamikon Mnatsakanian, “A Fresh Look at the Method of Archimedes,” The American 
Mathematical Monthly, June-July 2004, pp. 496-508. Euclid knew (in his Elements) that the 
volume of a sphere is proportional to the cube of the radius (a physicist would say this is obvious, 
from dimensional analysis), but it was Archimedes who determined the proportionality constant to 
be dn. 

2C. P. Nicholas and R. C. Yates, “The Probability Integral,” The American Mathematical Monthly, 
June-July 1950, pp. 412-413. Charles Parsons Nicholas (1903-1985) and Robert Carl Yates 
(1904-1963) were professors of mathematics at the United States Military Academy at West 
Point (New York). 
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Fig. 11.1 The geometry of Z 
Nicholas and Yates 


ry 


annular strip in x,y- plane 


They start by defining A to be the probability integral: 


A= | e7* dx. (11.1) 
0 


Now, imagine an x, y, z rectangular coordinate system (as shown in Fig. 11.1) which 
shows the curve z=e~ in the z,y plane. Then, as Nicholas and Yates write, “If the 
curve z=e~ be revolved [completely] about the z-axis, the surface [in space] 
generated is z=e7 (+9) with volume? 


veal [re (x*+y’) dxdy. (11.2) 


>The volume interpretation of (11.2) is easy to see, as follows. Think of each differential area patch 
dxdy in the x,y-plane as the base of a rectangular solid running up from the x,y-plane until it hits the 


surface e~ (*"*¥”). The differential volume of the solid is dV=e~ +") dxdy, and so integrating 
over the intervals 0 < x < oo and0 < y < ~ gives the total volume under the surface over the first 
quadrant of the x,y-plane. 
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Do you see why the “4” is there in (11.2)? It’s there because the revolution of 
z=e (#49) jg imagined to be a full 360° revolution about the z-axis, while the 
limits on the integral in (11.2) are associated with only a 90° rotation over the first 


quadrant of the x,y-plane. Continuing, Nicholas and Yates write 


v=4/ | e*axlePay=a f Ae “ay =4a [ eY dy 
J0 0 J0 0 


or 
V=4A’. (11.3) 


The last step is to now compute V in a second way, and then to set that result equal to 
4A’. The ‘second way’ is a standard freshman calculus technique using what 
Nicholas and Yates call “the method of hollow cylinders.” Here’s how that is done. 

Imagine going out in the x,y-plane a distance r= ,/x? + y” from the origin and 
then, starting at that point, we mark-off an additional interval of length dr (look at 
Fig. 11.1 again). We then revolve that interval through a full 360 rotation around the 
z-axis, thus sweeping-out a circular (annular) strip in the x,y-plane of radius r and 
thickness dr. The area of that circular strip is 2mrdr. On that circular strip we then 
erect a cylinder (a hollow cylinder) that extends upward until it intersects the surface 
z=e (*+Y") =e-”. The differential volume of the ‘skin’ of that hollow cylinder is 
dV = Cae (2nrdr) and so the total volume under the surface is the integral of dV 


over all r, that is, as r varies from zero to infinity. So, 


v= fav=/ 2mre" dr = 2n(— ze") 
0 2 


Equating (11.3) and (11.4), 


=n. (11.4) 
0 


AN =n 
or, 
A= vee dx = Ede 
— 0 “V4 2 


and, just like that, we are done. 
It is difficult to imagine a more transparent derivation than this. And yet, for some 
reason, Nicholas felt there was a need to say more, and so he did (now without 
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Yates) in a follow-up paper almost eight years later. He offered two reasons for 
doing this, neither of which I find very compelling. At the opening of his paper, 
Professor Nicholas writes of his paper with Yates, “Simplicity was achieved [there] 
by certain geometric devices, and by the assumption [Nicholas’ emphasis] that the 
improper [probability] integral is convergent. The feeling of uneasiness that goes 
with this sweeping assumption can be avoided by [this new paper].” 

In fact, the assumption that the probability integral is convergent is easily 
established by using the well-known technique of comparing it to a known conver- 
gent integral that is /arger than the probability integral. To be specific, since 


co . 1 : foe) 5 
i e * ax= | eMax+ | e “dx 
0 0 1 


and since iG e-* dx is clearly finite (the integrand is finite at every point in a finite 
integration interval), and since i i. ee dx < f i. e *dx (because e-* <e-* for 


x > 1, we have 
oS 5 
| e *~dx< —e * 
1 


and so deo e-* dx <oo. As for what “geometric devices” in the first paper were 
bothering Professor Nicholas, he provides no specific details. What he did do was 
evaluate the volume integrals by performing a three-dimensional version of the 
squeeze derivation that we did as a (much) simpler two-dimensional analysis at 
the end of Chap. 6. 

As I stated earlier, the volume interpretation of the probability integral seems to 
be rediscovered on a regular basis,” but one analysis in particular does offer an 
interesting twist on what Nicholas and Yate’s did.° Jameson asks, in his opening 
paragraph, “Has anyone seen [this] before?” and the answer is yes. But still, the twist 
is interesting. Jameson starts (as did Nicholas and Yates) with the curve z=e -Y in 
the z,y-plane and then revolves that curve around the z-axis (as did Nicholas and 
Yates). With I denoting the probability integral written as 


I=] e-* dx 


we then write I° as (this should be old hat for you by now) 


Co 


1 


4C_ PLN icholas, “Another Look at the Probability Integral,” The American Mathematical Monthly, 
December 1957, pp. 739-741. 

5 Alberto L. Delgado, “A Calculation of foe -x dx,” The College Mathematics Journal, September 
2003, pp. 321-323. 


°Timothy P. Jameson, “The Probability Integral by Volume of Revolution,” The Mathematical 
Gazette, November 1994, pp. 339-340. 
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Fig. 11.2 The geometry of Z 
Jameson’s calculation 


Py 


=f eFax e -Ydy= ie ra (?4y’) dxdy 


and thus I* is the volume between the surface z=e~ (49°) and the x, y-plane (see 
note 3 again). Note, carefully, that there is no factor of “4” here (as we wrote in 
(11.2)) because now the limits on our double integral run from —oo to oo, not from 
0 to oo.Now, unlike Nicholas and Yates, Jameson computes the volume in a way 
different from the use of hollow cylinders. Rather, he visualizes the volume as a 
stack of differentially-thin disks centered on the z-axis (see Fig. 11.2). Each disk has 
radius y and thickness dz, and so the differential volume of each disk is 


dV =ny°dz. 


As y runs from zero to infinity, we see that z runs from 0 (when y = oo) to | (when 
y = 0) and so 
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1 
v= fav=x} y'dz. (11.5) 
0 


Since y = —In (z), (11.5) becomes (this is Jameson’s ‘twist’) 


=-nx(-l)=n2 


V= -af In(z)dz = —x[zln(z) —z] ; 


0 


because (as shown in note 2 of Appendix E) lim, zin(z) =0. Thus, 


és 2 
v=P=s={ / eax 


and so, just like that, we have 


[ e8ax=ve (11.6) 


Chapter 12 ®) 
How Cauchy Could Have Done It cos 
(But Didn’t) 


For many years, the general feeling among mathematicians was that there are some 
integrals that appear to be, ironically, simply ‘too complex’ to be evaluated by 
complex function theory. The probability integral i e-* dx was the usual case 
put forth in support of that view. For example, the English mathematician George 
Neville Watson (1886-1965) wrote, in his 1914 book Complex Integration and 
Cauchy’s Theorem, “Cauchy’s theorem cannot be employed to evaluate all definite 
integrals: thus Ae e~*'dx has not been evaluated except by other methods.” Two 
decades later this gloomy opinion was repeated by the English mathematician 
Edward Thomas Copson (1901-1980) in his 1935 book An Introduction to the 
Theory of Functions of a Complex Variable. There we read “quite simple definite 
integrals exist which cannot be evaluated by Cauchy’s method, is e-* dx being a 
case in point.” 

Then, in 1947, the British mathematician James Cadwell (1915—1982) published 
a brief note (The Mathematical Gazette, October) in which he showed how to do the 
probability integral by performing two successive contour integrations, first around a 
rectangle, followed by another around a pie-shaped sector of a circle. In an adden- 
dum, however, Cadwell states that he had learned, since writing his original note, 
that the American-born British mathematician Louis Joel Mordell (1888-1972) had, 
decades earlier (!), already evaluated via contour methods the much more general 


WD at oa dt which reduces to the probability integral for the special case ofa = — | 


and b =c =d=0.' Shortly after Cadwell’s note appeared, the Russian-born English 


Alert! Don’t even try to read this chapter until you have read Appendix F 
(maybe twice) 


"You can find Mordell’s quite difficult (in my opinion) analysis in the Quarterly Journal of Pure 
and Applied Mathematics (48) 1920, pp. 329-342. 
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mathematician Leon Mirsky (1918-1983) showed how to do the probability integral 
in an even shorter note (The Mathematical Gazette, December 1949) using just a 
single contour (a skewed parallelogram). 

That same year, the Stanford University mathematician George Polya 
(1887—1985)—-see note 6 in Chap. 1—published a derivation of the probability 
integral using the same skewed parallelogram that Mirsky used.” (Neither Mirsky 
or Pélya cited the other, so it’s not clear what the connection between the two men 
may have been—but a footnote at the end of Polya’s paper indicates it had been 
written no later than November 1945, four years before Mirsky published). 

With the appearance of the Pdlya/Mirsky papers, discussions on doing the 
probability integral by contour integration seemed to have run their course. Then, 
fifty years later, a new paper appeared.” There we read “By now ... it is fairly well 
known that the probability integral can indeed be evaluated by a contour integration 
of a suitable integrand around a suitable skew (i.e., non-rectangular) parallelogram. 
Such a proof was given by Mirsky as long ago as 1949.” (There is no mention of 
Pélya’s paper by Desbrow.) Desbrow goes on to say that “The primary aim here is to 
demonstrate [in part] why rectangular contours cannot [my emphasis] work for the 
probability integral.” In other words, Desbrow seems to be claiming a skewed 
parallelogram contour must be used. Desbrow’s evaluation of his generalized prob- 
ability integral is perfectly correct, but a claim that the contour must be skewed is not 
correct. 

To demonstrate that, what I'll show you now is an analysis using a rectangular 
contour that almost effortlessly (with a lot of subtle cleverness!) evaluates the 
probability integral.* As you'll recall from Appendix F, our analysis will be that of 
calculating both sides of 


} f(z)dz = 2ni y residues of the singularities inside C (12.1) 
Jc 


where the proper selections of the function f(z) and the contour C are the central 
issues. Without any further word on motivation (but with a frank admission that 
selection of f and C are all-important) we'll use 


°G. Pélya, “Remarks on Computing the Probability Integral in One and Two Dimensions,” Pro- 
ceedings of the Berkeley Symposium on Mathematical Statistics and Probability 1949, pp. 63-78 
(in particular, see pp. 68-69). 

3Darrell Desbrow, “On Evaluating f i. e*(x~2>)dx by Contour Integration Around a Parallelo- 
gram,” The American Mathematical Monthly, October 1998, pp. 726-731. 

The analysis I’m about to take you through has been in the mathematical literature for some time. I 
first saw it in the textbook by the German mathematician Reinholt Remmert (1930-2016), Theory of 
Complex Functions, Springer 1991, pp. 413-414, but Remmert credits the German mathematician 
Hellmuth Kneser (1898-1973) with the original solution, dating from the mid-1930s. I have 
elaborated on Remmert’s presentation just a bit. 
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Fig. 12.1 The contour C and the three nearest singularities to C (each marked with a cross) of f(z), 
showing that only one singularity is inside C 


en? . /t 
f= eae a(t Daf (12.2) 


and take C to be the rectangular contour shown in Fig. 12.1, with the edges labeled a, 
b, c, and d. For now, R is some finite value, but we’ll eventually let R — oo. And 
finally, we note in passing that 


1 


a? = lata 5 la +o =5(1+)(14+)=F0+i+i-1) 
| a’ = ni. (12.3) 


Pll remind you of (12.3) later in the analysis. 

A look back at the end of Section F4 in Appendix F should convince you that f 
(z) in (12.2) is, indeed, analytic everywhere on and inside C except for the values of z 
where the denominator 1 + e “” = 0 (that is, at the locations in the complex plane of 
the singularities of f(z)). To find the singularity locations, we write 


e282 == = e2kt Iai 
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where k is any integer. Thus, 


— 2az = (2k + 1)zi 


= 0+ (8) aegteg = e+ (5) 


=e D3 ( 5) +i)= —(2k+1)5ak=0, #1, 42, 43,..., 


That is, f(z) has an infinity of singularities, located at the odd multiples of 5a. All 
of these singularities are seen to be on the diagonal line y = x, and the three crosses 
in Fig. 12.1 show the locations of the first three singularities closest to C. The lone 
singularity inside C is at z= 5a. I'll discuss the nature of this lone singularity (that is, 
its order) when we calculate the residue of that singularity. 

Let’s now work through the left-hand-side of (12.1), and write-out the contour 
integral edge-by-edge as we go around C counter-clockwise, starting with edge a. In 
doing that, Pll use the following for each specific edge: 


edgeas z=x,dz=dx 

edgeb: z=R+iy,dz=idy 
edgec: z=x+i 5», dz = dx 
edged: z= —R+iy,dz=idy. 


Thus, 


R m 4 (RtiyY -R Fs 0 e-CRtiy) 
— 7 Lg | ; 
filed [ta yax+ f ; — a aateryy  Y + | i(x ti 5) ! [ : — aac Ry! FY: 


(12.4) 


Eventually we are going to let R — oo in (12.4) (but not quite yet), and when we do 
you can see that the second and fourth integrals (the ones for the vertical edges b and 
d of C) will go to zero because of the e~® in the numerators of the integrands. So, 
anticipating that, we ignore those two integrals and write 


é f(2)dz= jim | ie “floes a | - (« zy iJ) ax 


or, 
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In the right-most integral of (12.5) replace i Js with its equivalent from (12.2) (that 
is, i\/}=a— ,/5 ) and then (12.5) becomes 


é f(2)d2= tim | i “flan - / g (x $a 4) ax . 


Next, make the change of variable to y = x — aa (and so dx = dy) to get 


ftle)de = tim [00 — [go +a)dy}. 


Now let R —> ov to arrive at 


¢ f(z)dz = / ” FRG) — Ee Pa) aie (12.6) 
Cc —0o 


Well, okay, you say, this all looks fine but still, at this point you may well be 
wondering if we have actually made any real progress? Well, we have, because of 
the following calculation of f(x) — f(x + a). 


—x? — (x+a) 
f(x) —f(x+a)=— i 


2 2 2 2 
—x* — x* — 2ax —a* 
e e 


1 + e-2ax 1 + e— 2a(xta) _ 1 +e72ax 1 + > 2ax — 2a? 


—x? —x? —2ax,,—a* 


_ e e e 
a 1 + e- 2ax 1 +e 2aX%e- 2a" 
Recalling (12.3), 


or,ase = —1 and eo" = 1, we have 
2 a 2 
e~* e-* — 2ax e* (1 +e~ 78) 
f(x) —f(x+a)= i a ro = 1+ e-2ax 
ago 


So, (12.6) reduces to 


ftleiae = fe *a (12.7) 


and we see that the contour integral of (12.1) is precisely the probability integral! 
To calculate the value of the contour integral (and thus that of the probability 
integral) we calculate the right-hand-side of (12.1) as follows. With f(z) = at , the 
zeros of h(z) are the locations of the singularities of f(z). For our problem, with 
g(z)=e ~® and h(z) = 1+e ™, we have an infinity of zeros, with each occurring 
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exactly once. That is, the singularity associated with each zero is first-order and so, to 
calculate the residue of the lone singularity inside C, we use Eq. (F8.9) in Appendix 
F (in section F8): 


h’(zo) 
So, 
_2 1,2 Le 
=a = {ey 5 
bi = (l+e #8) —4 4 4 -= e ao cos (x) ~i sin (zn) aoe = 
a 20 — 2ae-* —2ae—™ 2a 
dz z=ta —2ae “2 
1 fi-i y\ oy G-id-i)_ 1 f1-i-i-1\ 1 (-2i/)_ 1 
(2 2(1 +i) = 7a (sn) —~ 2/a\ 1-1 sal ) ONES 


Thus, from (12.1) and (12.7), 


o 2 1 
ate jefe TV \ 
Je x ni( is) Ja 


and we are done. 

When you compare this analysis with the amazing brevity of Poisson’s derivation 
in Chap. 6, you might be reminded of the old saying about killing a fly with a cannon 
ball. Cauchy’s contour integration does unleash a massive overkill of firepower, but 
remember—for a long-time people thought doing the probability integral via contour 
integration was simply not possible. Now you know that isn’t so. 


Epilogue 

All of Cauchy’s thinking about e-* led to one of his particularly curious discov- 
eries. As everyone taking freshman calculus learns, any ‘sufficiently smooth’ func- 
tion f(x) (‘sufficiently smooth’ means all of the function’s derivatives exist) can be 
written as a power series expansion about x = 0: 


0 f(0 
f(x) = 2 ) xk 


where £0) is the kth derivative (k > 1) of f(x) evaluated at x = 0, and where 
f(0) is interpreted as £(0). That is, 


x3 


f(x) =f(0) + f'(0) + f”(0) x +£"(0) 3 


a iNcds (12.8) 
You surely recognize (12.8) as a special case of the Taylor series expansion of f(x), 
as this power series is in every calculus book in existence, past and present, and will 
be (I'd wager) in all the calculus books yet to be written. 


The validity of the claim that the right-hand-side of (12.8) always converges, for 
every x, to f(x), was taken as gospel as the eighteenth century came to a close. 
Indeed, in 1797 the French-Italian mathematician Joseph-Louis Lagrange 
(1736-1813) wrote an entire book based on the assumed truth of (12.8). And then, 
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Fig. 12.2. Cauchy’s function around x = 0 


in 1823, just ten years after the second edition of Lagrange’s book appeared, Cauchy 
showed Lagrange’s assumption is not true! Cauchy did that by the direct method of 
discovering a specific f(x) that, while infinitely differentiable, has a power series 
expansion that does not converge to f(x). Specifically, Cauchy’s function is 


(12.9) 


which is continuous everywhere. Figure 12.2 shows what Cauchy’s function looks 
like. 

If you start to calculate the derivatives of Cauchy’s function, you get for the first 
two 


2 sa. 6 4)\ -1 
f'(x)= ae = and "(x)= (- S+a)e 2, 


These two are enough to make the general pattern clear: all the derivatives are of the 
1 
form of a sum of various powers of : , with the sum multiplied by e *. What makes 


this particularly interesting is what happens to all those derivatives as x > 0. 
1 
e x 
+m 


When we attempt (by setting x = 0) to evaluate 
positive integer, we get the seemingly unhelpful 


, where n is an arbitrary 


118 12 How Cauchy Could Have Done It (But Didn’t) 


NIH 


ae: 0 
lim =.=? 
x0 x? 0 


But, of course, this is the form of an indeterminate expression that is just what we 
need to be able to use L’ H6pital’s rule. That is, with y = 1 (and so y > co as x > 0) 
we have 


where the dots indicate yet more applications of L’H6pital’s rule. With each succes- 
sive application of the rule, we reduce the power of the numerator by 2 until we reach 
either O or 1. Since the denominator remains e" we see that 


for any given positive integer n. Thus, all the derivatives of Cauchy’s function are 
zero at x = OQ, and that is the mathematical reason behind the hard-to-overlook ‘flat 
bottom’ of Cauchy’s function at and around x = 0. Putting £0) = 0 into (12.8), we 
have for any x the series 


0+04+0+...=0 


= 


which is certainly note». 

So, how did Cauchy come up with (12.9)? I’m sure there are any number of 
scholarly ‘explanations,’ but I suspect that the real story is simply that, after 
pondering e-* for a long time, what could be more ‘natural’ for a curious 
mathematician to do than to flip the x* upside-down, if only just to see what 
happens? I think that’s what Cauchy did, and boy (!), did he see something happen! 
Or, not to be (too) flippant, he saw absolutely nothing happening in the neighbor- 
hood of x = 0.° 


>This is the mathematical equivalent to the famous clue that caught the attention of Sherlock 
Holmes in the 1892 tale ”’The Adventures of Silver Blaze’ — the dog that did nothing in the night. 


Chapter 13 ®) 
Fourier Has the Penultimate esi 
Technical Word 


We start with the Fourier transform pair (and if this is unfamiliar to you, see 
Appendix G): For a given function f(t) we write its Fourier transform as 
F(@) = f~f(t)e~ dt, and then the inverse Fourier transform is given by 
f(t) = + f>_F(@)edo. Electrical engineers, in particular, associate t with time 
and @ with frequency, but for mathematicians such physical associations are not 
necessary. For them (and us) t and w are simply symbols with no special meanings. ' 
F(@) and f(t) each uniquely define each other, and we write f(t) «+ F(@) to indicate 
their one-to-one correspondence. 
Now, suppose 


f(t)=e"*, —co<t<oo. (13.1) 


Notice, carefully, that ; lim f(t) =0, an observation that will be of importance in 


just a few more steps. The derivative of f(t) is 
f'(t)= —te~* = —tf(t) (13.2) 


and so, taking the Fourier transform of each side of (13.2), we have 


a f'(t)e dt = -[- tf(t)e~ "dt. (13.3) 


— oo — co 


Using integration-by-parts on the left-hand-side of (13.3), that is, writing 


‘When I wrote that line I was reminded of these curious words (from the 1901 essay “Mathematics 
and the Metaphysicians,” written by the British philosopher and logician Bertrand Russell 
(1872-1970)): “Mathematics may be defined as the subject in which we never know what we are 
talking about, nor whether what we are saying is true.” Russell was almost surely (I hope!) simply 
trying to be dramatic and (perhaps) went just a step too far with the never. 
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[vav=(w) -{ v du 


with u =e (and so du = — iwe “dt) and dv = f (t)dt (and so v = f(t)) gives 


‘| f(t)e~ at =f (te! 


— oo 


+f ine (ati [ f(t)e~ dt 


because (you'll recall), lim f (t) =0. So, 
io | f(t)e~ dt = — / tf(t)e dt 


or, 


i@F(@) = — 7 = tf(t)e~ dt. (13.4) 


— co 


Now, differentiating F(@) with respect to a, we have 


a ae ae ee — int 
nf. itf(t)e “dt = if aye dt 
or, 

” tf(t)e~ dt = — “ a (13.5) 


— oo 


Putting (13.5) into (13.4), 


or 
dF 
oF(@) = — an 
and so 
. = —adw 


which integrates to give 
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In{F(@)} = — fo? + arbitrary constant 
or, with C some constant, 
F(@) =Ce~*”. (13.6) 


Indeed, notice that C = F(0). 
From the definition of the transform integral, we have 


F(0) = [= fief 


and so (13.6) becomes (with an obvious change of variable) 


F(o) =e [ e Hamel | e dy. (13.7) 


io. <) Co 
alt 1 —hy _- 1y? iot 
ev = — e? e » dy pe’ dw 


1 ne 1,,2 yy oe Tee. oa 
a —5y — 50° ,iot 
-aif_e 2 ay} [ie e to. 


Now, set t = 0 in (13.8). Then (with another obvious change of variable), (13.8) 
reduces to 


1 eo 1,2 ee 12 1 aa 1,2 
as, ay — 70" a = 5X" 
i=m{ fe 2 ay\{ [e ao} ati ax} 


and so, just like that, we arrive at the probability integral 


(13.8) 


/ e- dx = V2n. 


Chapter 14 ®) 
Finbarr Holland Has the Last Backes 
Technical Word 


As our final, spectacular derivation of the probability integral, what follows is an 
elaboration of an elegant little bit of recent analysis that I came across almost by 
accident.’ I was so impressed by Professor Holland’s derivation that I decided it had 
to be the concluding analytical chapter of the book (and so Fourier was moved from 
having been the book’s original analytical closer to having the next-to-the-last 
technical word). 

Consider the expansion of (e™ + e~*)*" by the binomial theorem. That is, write 


(ex 4 eo xy" = . ‘ ( 2) (e*)K(e- x)" -* = pete ~#2a-Wx 
=0\ 4 = 


= 2n ikx , — i2nx gikx __ 2n i2kx , — i2nx 
= 2x=0 © em = dicate 


; 2n ; 
= eo = ( ) a 2n el2({k—n)x_ 


k=0 
k#n 
Since 
(e* A: eo) _ [2cos(x)]" — 72" ogg n(x) 
we have 


"See the paper by Finbarr Holland, cited in note 4 of Appendix C. The title of Professor Holland’s 
paper asserts it is all about Stirling’s formula (which is why I was reading it as I wrote that 
appendix). I eventually decided, however, that it was at much too deep a level for this book, and 
so turned to another source for Appendix C. But all was not lost, as at the end of his paper (in an 
almost throw-away fashion) Professor Holland presents the beautiful derivation of the probability 
integral discussed here. 
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27" cos"(x) = ‘ie + aa ac. (14.1) 


n 
k#n 


Now, integrate (14.1), term-by-term, from —z to x. That gives us 


7 n 1 2n 7 1 2n 7 1 —n)x 
[cos ex= 7 ( : ) forget fe ie 
k#n 


1 2n 1 2n ei2(k—n)x 
— —— 2 — = ——__—_____,, 
2 ( *) a (01 Sea 


k#n —t 
Since 
e2(kK-n)e _g-2(kK-n)x 97 sin{2(k —n)x} 
tk —a) = 2(k —n) Ofor k#n 
then 
1 [* 2n _ 1 /2n _ 
xz | 8° (x)dx = 55 (4"), 1=0,102, 3.4n (14.2) 


Put (14.2) aside for now (but not for long) and define 


n 


Jn (°°) /n(2n)! (14.3) 


Cy = = = ‘ 
gn (n!)?27" 


Thus, 


a Went 2" 2n+ 1 2*"(n!)* 


oe aon | ns 


(2n +1) con 


As n — oo the expression in the first set of square brackets on the right limits to 5 
and from Eq. (B.6) in Appendix B we see that the expression in the second set of 
square brackets on the right limits to 2. So, 


lim c= 12 = ul 
n—>0o 2k 1 


or, 
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1 
lim cy, = —. 14.4 
n— 00 Ja ( ) 
Now, notice that (14.3) can, if we remember (14.2) as well as recall the shape of the 

cosine curve, be written as 


n me n 
_ 2vn {1 — sin?(x) "dx. 


nm Jo 


dt_ _ _ dt 
cos(x) ~ 4/1 2’ 


Next, change variable to t = sin (x). Then dt = cos (x)dx or, dx = 


and so 


va 2)n dt ae 2)n-1/2 
: w Jo { } v1-? Jo { } 


Making another change of variable, to t= Ti (and so dt = *), we have 


; _ 2A Jn i 52 n—1/2 dé _2 Jn ees ase 
"nt Jo n vn To n 


So, using (14.4), 


Jn 2 n—1/2 
fies = Ha 2 bas ds 
n—oo Ja nowy n 


or, assuming we can interchange the limiting and the integration operations 


00 2)n-1/2 00 {1- zy 
= | tim {1- =} as== | lim c= 
T Jy 27% n T Jy 27% yi-2 Va 


and so 


i lim Vee gs ii (14.5) 
0 s? 
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Now, as n — oo the denominator of the integrand in (14.5) limits to 1,” while the 
numerator limits (recalling one of the more famous limits in calculus) to e~* . That 


is, we are left with 
co I 
/ e ‘ds==Vn 
0 2 


and we are done. 


This is a plausibility step, at best, since s itself becomes arbitrarily large over the interval of 
integration. What we can say is that for any finite value of s the denominator limits to 1. 


Chapter 15 
A Final Comment on Mathematical Proofs | 3x 


All the proofs in this book have (unlike the one in Fig. 15.1) been successful. That’s 
not always the outcome, of course, and in your career as a mathematician, physicist, 
or engineer, you’ll almost surely begin analyses that at some point come to an 
unhappy end. This depressing situation was well-captured in the powerful 2016 
novel The Doubter’s Almanac by Ethan Canin. There we read the (often appalling) 
story of Milo, a math genius who attempts to solve a long-standing problem of 
stupendous difficulty.! The going gets plenty rough for Milo. As he begins his quest, 
we read 


At first consideration, the problem appeared almost facile; but the essentials quickly hid 

themselves. He’d begun to conceive of the proof as a fortified castle pierced by ten thousand 

brightly painted doors, each of which was designed to deceive him. All ten thousand would 

open—this wasn’t the issue—but so far, none of them had allowed him entrance. 
—Perhaps none ever would. 


Eventually Milo comes to the realization that what would be needed is not just 
endless, methodical work like “a man moving a load of gravel with a wheelbarrow,” 
but rather 


Brilliance. Luck. A moment of godly imagination. He would need all these things but like 
every mortal could do nothing to summon them. He could only hope, as he sat silently at his 
desk, that one of them might one day make a wary visit. He was aware that he’d set course 
for a shore that he might never reach. 


Milo is a thoroughly despicable math whiz, tolerated by his colleagues in the mathematics 
department at Princeton University only because he is—when sober—brilliant. When not sober 
he projects arrogance and alcohol fumes in equal measure and, at one point, assaults a Nobel prize- 
winning economist with a tree branch. After that unfortunate episode, his behavior takes a turn for 
the worse. 
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Cate BY 


\s Ean 
aha « 
ZN, Re X= Ne 
+ 


“He was on a roll up to this point.” 


Fig. 15.1 When things don’t go well. (Drawing by arrangement with CartoonStock/UK.) 


Developing a valid proof is hard work (that’s why it’s called research!), but don’t 
give up—and certainly don’t let it drive you nuts! Running into rough spots happens 
not just to fictional characters like Milo, but even to the Greats. In a September 1805 
letter to a friend, for example, Gauss described how a failed search for a proof had 
“tormented” him for years: “This deficiency ruined for me all the other things I had 
found. For 4 years, a week seldom passed without my making one or another futile 
attempt to loosen [the] knot .. . Finally, a few days ago I succeeded—not through my 
laborious searching but merely by the grace of God I might say. Just like lightning 
striking, the puzzle resolved itself.” So, if you get stuck, keep trying until Gauss’ 
lightning or Milo’s “godly imagination” strikes! 

I hope this book has been an inspiration for you in seeing how there can often be 
more than one way in developing a mathematical proof. 


Appendices 


Appendix A: The Binomial Theorem 


This is not a book on probability, but the probability integral gets its name from how 
it was discovered by De Moivre, one of the pioneers of probability. This appendix 
will tell you everything you need to know about probability in order to read this book 
(which, really, is not very much!). Everything you read here would be covered in the 
very first (at most two) lecture(s) of an introductory probability course, and most of it 
(but perhaps not all) will be intuitive, in any case. You do not have to take just my 
word for this claim. In 1717 De Moivre published The Doctrine of Chances, a 
mathematical treatment of many of the questions presented by various gambling 
schemes. Doctrine eventually fell into obscurity, and when Gauss and Laplace 
independently discovered (decades later) many of its results, it was they who 
received credit (see note 4 in Chap. 1). As one of the pioneers of probability, De 


Moivre gave this advice in Doctrine’s Preface: 
On this occasion, I must take notice to such of my Readers as are well vers’d in Vulgar 
Arithmetick, that it would not be difficult for them to make themselves Masters, not only of 
all the Practical Rules in this Book, but also of more useful Discoveries, if they would take 
the small Pains of being acquainted with the bare Notation of Algebra, which might be done 
in the hundredth part of the Time that is spent in learning to write Short-hand. 


To measure the value of what you’ll read here, consider this question (one that De 
Moivre would have considered to be ‘elementary’): If you toss a fair die twice (a fair 
die has six faces, each with probability zof showing on a toss), what is the probability 
you get at least one 6? Can you answer that, right now? If so, great. But if not, you 
will be so able by the time you finish reading this appendix. So— 

Imagine we perform a particular ‘experiment’ which has just two possible out- 
comes. If the experiment is that of flipping a coin, for example, one outcome is heads 
and the other is tails. Which outcome occurs on a flip is probabilistic, which means 
that while we don’t know which outcome will occur, we do know the probability of 
occurring for each outcome. For a fair coin the probability of heads on any particular 
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flip is 5, as is the probability for tails. The two probabilities add to | (the probability 
of certainty) because, while we don’t know which outcome will occur, we do know 
one of the two will occur. More generally, to account for the case of an unfair 
(or biased) coin, we’ll write p as the probability of heads and | — p as the probability 
of tails. Both p and 1 — p are probabilities, and so each takes its value from the 
interval 0 to 1. 

If we flip the coin twice, and if we assume the individual flips are independent, 
then we can multiply probabilities (this is an axiom of probability, and serves as the 
very definition of independence). So, the probability of two independent consecutive 
heads is p’, the probability of two independent consecutive tails is (1 — p)”, and the 
probability of one head and one tail is 2p(1 — p). (The 2 is present because one head 
and one tail in two flips can happen two different ways.) Notice that 


p+ (1—p)’ + 2p(1 —p) =p? + 1 — 2p + p* + 2p—2p?=1 


because we have accounted for everything that could possibly happen in two flips 
(and something must happen). 

Consider now the general case of flipping a coin n times. What is the probability 
of getting exactly k heads in n flips? To answer that question, let’s first calculate how 
many different ways we can get exactly k heads, where k = 0, 1, 2, 3, ..., n? We 
imagine that flipping the coin n times generates a sequence of symbols of length n 
(a sequence consisting of k H’s and n — k T’s, with each symbol written on its own 
slip of paper). Except for the symbol, the slips of paper are indistinguishable. Further 
imagine there are n empty boxes in front of us, labeled 1 to n. Into each box we’ll put 
one of the paper slips marked with either an H or a T. The symbol on the slip we put 
into the box labeled j will represent the outcome on the jth flip. We’ll start with the k 
H slips. 

For the first such slip, there are n choices for the box that gets the slip (marked 
with an H). For the next H slip, there n — | choices among the remaining empty 
boxes, for the third H slip there are n — 2 choices among the remaining empty boxes, 
and so on, with there being n — k + 1 choices among the remaining empty boxes for 
the last slip marked with an H. So, the number of ways to put the k slips marked with 
an H into the boxes is 


n(n— 1)(n—2)(n—3)...(n—k+ 1) 
=([n(n—1)(n—2)...(n—k+ 1)] nn) 


The n — k still empty boxes then each get a slip marked with a T. 

Notice that this process counts as distinct (that is, distinguishable) the order in 
which we put the k H-slips into the boxes. For example, if k = 3 and n = 50, and we 
put the three H-slips into boxes labeled 17, 31, and 47, then the above process counts 
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the orders 17, 31, 47 and 31, 47, 17 as distinct, while for our coin flipping experiment 
they are not distinct. Each order aaard in the same final result: a heads on flips 
17, 31, and 47. So, we should divide ("pi ip by the number of permutations of the k 
slips, that is, we should divide by KI Thus, the number of different ways to get 
exactly k peas in n flips (that is, the number of distinct sequences of k H’s andn — k 
T’s)i IS 7 Cessiee which is written as 


aay = (a): 


Notice that this expression tells us that 


n n! n! 
(oa) ~ (a—K)l(n—[n—k)!  (n—k)IKI 


and so 


Finally, 
n ‘ : 
() =Oifk<Oorifk>n. 


Each of the G) ways to get exactly k heads on n flips occurs with probability p“(1 — 
p)" — *, and so the total probability of exactly k heads on n flips is (*)p*(1— ae 
For a fair coin, with p= 5, the probability of getting exactly k heads in n tosses is 
1 (n 
» ({). 

The notation (2) is called a binomial coefficient because of its appearance in the 


binomial theorem. Since the time (1654) of the French mathematician Blaise Pascal 
(1623-1662) it has been known that 


(atb)"=S- (2b (A.1) 


'The assertion is physically obvious because it is impossible to get either a negative number of 
heads, or more heads than there are tosses. It follows mathematically because, in either case for k, 
we’ll get the factorial of a negative integer (which blows-up) in the denominator of (?). Here’s how 
to see that. Start by noticing that m ! = m(m — 1)! a so (m—1)!= ml, Now, if m = 1 then 
O!=1=1. So, if m = 0 we have (—1)!=% =1= +00. And ca m = — | we have 


(—2)! co a oo. And so on. 
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where n is a non-negative integer. Shortly after 1665 the English mathematical 
physicist Isaac Newton extended the binomial theorem to negative and non-integer 
values of n using less than rigorous arguments, arguments which were later (1826) 
provided by the Norwegian mathematician Hendrik Abel (1802-1829). 

Notice that if a = p and b = (1 — p), the binomial theorem says 


(p+1—p)"=1=>y_,(,)p*a-py* (A.2) 


for any n> 0 and 0 <p < 1. This says the probability of getting exactly 0 heads out 
of n flips, plus the probability of getting exactly 1 heads out of n flips, plus ... plus 
the probability of getting exactly n heads out of n flips, is 1. And that is exactly what 
we’d expect. In Appendix C [ll show you a far more sophisticated probability 
problem involving binomial coefficients. 

Okay, to finish, have you calculated the answer to the question that opened this 
appendix? You now know enough probability to do it two different ways. First, to 
get at least one 6 on two tosses of a single die means you got exactly one 6 or you got 
exactly two 6’s. To get exactly one 6 means the first toss was a 6 (with probability a 
and the second toss was anything but a 6 (with probability 2), or the first toss was 
anything but a 6 and the second toss was a 6. So, the probability of getting exactly 
one 6 with two tosses is 


(6) (6) e (6) (5) = 7 = %6 - 56 


The other possible way to get at least one 6 is to get two 6’s, with probability 


(5) (6) = 36 
6/\6/ 36° 
So, the total probability of getting at least one 6 is i. 
An alternative (and perhaps easier) way to compute the answer is to recognize 
that the probability of getting at least one 6 is 1 minus the probability of getting no 


6’s. Getting no 6’s means each of the two tosses showed anything but a 6, with 
probability 


: : 25 _ 
and so (again) the answer is 1 — 7 = x. 


Here is a little challenge question for you to think about that’s a bit deeper, a 
question typical of the sort of probability questions that De Moivre received from his 
London gambling clients. Suppose you toss two fair dice six times. At the comple- 
tion of each toss, you record the sum of the two dice. What is the probability the first 
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sum is different from any of the sums recorded on the next five tosses? A computer 
simulation (a Monte Carlo probabilistic simulation—see the following box) of this 
process estimates the probability to be ~0.56. See if you can calculate the exact 
probability. Hint: The exact probability is given by the ratio of two integers (9-digits 
in the numerator and 10-digits in the denominator). If you write to me at paul. 
nahin @unh.edu [ll tell you if your answer is correct. If you get stuck, write to me 
anyway and I'll e-mail you a detailed solution. 

The MATLAB code dtoss.m (for double toss) simulates a game of tossing two 
fair dice six times and, after each tossing, recording the sum of the dice. At the 
completion of the sixth toss, the code checks to determine if the first sum is different 
from each of the next five sums. The code does this for ten million games. (Note: the 
line numbers at the far-left are NOT part of MATLAB, but have been inserted purely 
as reference tags.) On a quite ordinary, several years old laptop the simulation of ten 
million games (using 120 million random numbers) required less than four seconds. 


sdtoss.m 
01 bingo=0; 
02 for loop=1:10000000 


03 EOmEOS S—He16. 

04 R=6*rand; 

05 di=floor (R) +1; 
06 R=6*rand; 

07 d2=floor (R) +1; 
08 s (toss)=di+d2; 
09 end 

10 match=0; 

ILL for k=206 

AIP Hie (iL) —=—=155 (9) 
13 mateni=i¢ 

14 end 

ILS end 

16 if match==0 

17 bingo=bingo+1; 
18 end 

19 end 


20 bingo/loop 


Line 01 initializes the variable bingo to zero, where the value of bingo is the 
current number of games in which the last sum is different from any of the first five 
sums. Lines 02 and 19 define the loop that simulates ten million games. Lines 03 and 
09 define the loop that simulates an individual game. Lines 04 and 05 simulate a toss 
of the first die, with R being a random number from the interval 0 < R < 6 (notice 
that the end-points, 0 and 6, are not possible values for R). Line 05 rounds down to 
the greatest integer Jess than R and then adds | (for example, floor(4.3) = 4 and so 
floor(4.3) + 1 = 5). Thus, d1 is an integer from 1 to 6. Lines 06 and 07 do the same 
for the second die (d2). Line 08 computes the sum of the two dice and stores the 
result in the six-element vector s. When the six double tosses are finished the variable 
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match is set to zero in Line 10, and then Lines 11 through 15 check to see if s 
(1) equals any of the last five sums. If a match is found, match is set equal to 1. If no 
match is found, match remains equal to 0. Lines 16 through 18 increment bingo by 
1 if match is zero. Line 20 computes, after the ten millionth game is completed, the 
probability of no match. 

As a fun little twist to this problem, suppose that rather than computing the sum of 
the two dice we instead compute their (positive) difference. How does that change 
the answer? You should find that the Monte Carlo code requires only a single 
(obvious) change to line 08, while a theoretical calculation requires an entirely 
new analysis. This feature, alone, is a major reason for the attractiveness of the 
Monte Carlo method. (Partial answer: the probability changes by a Jot!) Chap. 1, and 
Appendix H, include additional examples of Monte Carlo simulations. 


Appendix B: Wallis’s Infinite Product for Pi 


Suppose you are given a finite degree polynomial 
f(x) =L+ayx+ aox? +... + ayx”, 


where it is clear that f(0) = 1. The ne degree equation f(x) = 0 has n roots that P’Il 
write as 1, Tf, 13, ... fy. We can then write 


r(x) = (1 x) (1 x) (1 X)...(1-) (B.1) 


because the product on the right of (B.1) is a polynomial of degree n that has value 
1 at x = 0 with roots ry, fo, 13, ... Tp. 

With all this in mind, Euler had one of his numerous (but always brilliant), 
seemingly outrageous insights (insights that made him famous in the world of 
mathematics) concerning the power-series expansion of sin(x): 


; ox i oa 
sin(x) =x art ay peo =al att 5 nt): 
This says sin(x) is the product of x and an infinite degree polynomial, with the 
polynomial’s value at x = 0 equal to 1. ‘So,’ reasoned Euler (in 1734, the year after 
De Moivre deduced the value of the probability integral), ‘let’s write that infinite 
degree polynomial factor just as we did in (B.1) for a finite degree polynomial.’ That 
is, let’s write 


? Claiming the product on the right of (B.1) is f(x) is the mathematical equivalent of the assertion ‘If 
it looks like a duck, walks like a duck, and quacks like a duck, it’s a duck.’ 
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sin(x) =x| (1 x)(1 x)(1 *)...(1-2)..]. (B.2) 


The roots rj, f2, 13, ... tf, are the values of x that give sin(x) = 0, and so are —a, 1, 
— 2n, 2n, — 3m, ... and so on. We exclude 0 as a root of the polynomial because that 
root of sin(x) is taken into account by the lone x in front of the square brackets of 
(B.2). Thus, 


Or, 


sin(x) =| (1 =)(1 a) (1 a) ---) (B.3) 


In particular, if we set x = 5 then (B.3) becomes 


2 
NT7~ m/4\ — ty7~ An? — 1 
1= 37. (1-59) =FIL.(1- igi) = SIL. (a 4n2 ) 


ey (2n + 1) 
=*TT",! n)(2n) 


or, at last, 


oo (2n )(2n) Tt 
Tlie Gao py = 2° (B.4) 


That is, writing (B.4) out in detail, 


The infinite product in (B.4) had actually been discovered long before Euler was 
born—by the English mathematician John Wallis (1616-1703) in 1655—but by 
means that can only be described as a sorcerer’s mixture of obscurity and audacious 
inspiration. Euler’s infinite product derivation is much less obscure (but still pretty 
heavy on inspiration). 

Another expression of great interest is tucked away in Wallis’s infinite product. 
You'll notice that, for n running through the values | to k, with k any integer, all the 
integers in the numerator (2, 4, 6, ..., 2k) of the product in (B.4) appear in pairs, 
while the integers in the denominator (3, 5, 7, ..., 2k + 1) appear in pairs only up to 
2k — 1. The last integer in the denominator, 2k + 1, appears without a mate. So, if we 
take the square-root of Wallis’s product we have 
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feet (2)(4)(6) ... (2k) _ fk 
k— 00 (3)(5)(7)... (2k — 1) 2k +1 2 


(B.5) 


With a bit of algebra, we can cast (B.4) into yet another useful form: 


nm (2 2\/(4 4\(6 6 ; 1 n (2k)? 
5= (5) 5°5) (5-5) altar Tbe ape 


dq (2k)? (2k? 2K 
en . = lim Tia. .aNpaa 
noo 2n + ae (2k — 1)? (2k)? n—>oo2n + rh 7K pepe 


or, 


2(n!)* ek 
a (2n+1){(2n)}}? 2° (B.6) 


We’ll use (B.6) in Appendix C, and we’ll see it again in Chap. 14, the book’s final 
derivation of the probability integral. 


Appendix C: Stirling’s Asymptotic Formula for N! 


Stirling’s formula has many important applications in many places of pure and applied 
mathematics. Its proof is difficult.* 
—The first sentence is true. The second is not. 


The derivation presented here is an extensive elaboration of a note published more 
than 70 years ago." It has the attractive pedagogical feature of requiring nothing 
beyond freshman calculus. Indeed, as Professor Coleman writes in his first para- 
graph, “It [the derivation] could ... be used in a first course in Calculus.” We start 
with Fig. C1, which shows a sketch of y = In (x) for x > 1. Since the slope of the 
curve is the derivative of y (which is 1/x ), we see that the slope of the curve is 
everywhere positive and decreasing with increasing x. That is, the curve bends in 
such a way that the dashed chord BC lies everywhere below the curve. Therefore, the 
area of the trapezoid ABCD is less than the area under the curve from x = Xo to 
X=xXotl. 

The area of the trapezoid is 5 (In(xo) + In(xo + 1)]. Indeed, we can write similar 
expressions for all the unit-width trapezoids on the intervals x = | to 2, x = 2 to 
3, x = 3 to 4, and so on, all the way up to the unit-width trapezoid on the interval 


3Ralph Palmer Agnew, Differential Equations, McGraw-Hill 1960, p. 93. Agnew (1900-1986) was 
a professor mathematics at Cornell University. 


4A. J. Coleman, “A Simple Proof of Stirling’s Formula,” The American Mathematical Monthly, 
May 1951, pp. 334-336. We’ll encounter Professor Coleman again in Chap. 3. 
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Fig. C1 The area of ay 
trapezoid ABCD is less than 
the area under the curve 
from x = Atox =D 


x =N-— 1 tox =N (a total of N — | trapezoids). So, we can write the total area of 
these N — | trapezoids as 


{in(1) + In(2)] + 5 [tn(2) + In(3)] + 5 (3) + In(4)] +... +5 flm(N ~ 1) + In(N)] 
=F [n(2) + in(2) + In(3) + In(3) +... + In(N~1) + In(N~1)] +5 nN) 

= In(2) + In(3) +... In(N~1) +3 In(N) 

= In{(2)(3)...(N—1)} + In(N) — 5 ny) 


1 


= In{(2)(3) ... (N= 1)N] = 5 In(N) = In(N!) — 5 n(N). 


a 

2 
Let’s next write Cy as the amount by which this total trapezoid area is less than the 
area under the curve from x = 1 to x = N. That is, 


cx= f In(x)dx — [in(Nt) — 5 In(N) (C.1) 


From Fig. Cl we can see that with the addition of each new trapezoid the sliver of 
area between the curve and newest trapezoid is ever-smaller (but always positive). 
So, Cy monotonically increases with increasing N (you’ll see the significance of this 
observation, soon). 

Now, consider Fig. C2, which shows two unit-width trapezoids (AECD and 
ABGD), each of which has an area greater than the area under the curve from 
x = A to x =D (because both top edges EC and BG lie completely above the curve). 
These two trapezoids are generated by drawing the tangent lines to the curve at B and 
C, and then making the vertical extensions of AB and DC. E is the intersection of 
AB’s extension with the tangent line at C, and G is the intersection of DC’s 
extension with the tangent line at B. 
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Fig. C2 Trapezoids ABGD ay 
and AECD each have area 
greater than the area under 
the curve from x = A to 
x=D 
> a. 
The slope of the tangent line at C is 1 and so the equation of that tangent line is 
ot 


Y= xX +b. As y = In (Xp + 1) at x = Xo + 1, we have In(xp + 1) = 1 + b and so 
b = In (% + 1) — 1 and the equation of the tangent line at C is 
y= xnxt In(xpo+1)—1. At x = Xo, then, we see that AE= 22 + 


xotl 
In(xo + 1) —1= In(xo + 1)4+-® aa and so 


1 
AE= In(Xo + 1) —— Se 1 
The slope of the tangent line at B is = and so the equation of that tangent line is 


y= ax + bor, as y = In (Xo) at X = Xo, we have In(xp) = 1 + b. Thus, b = In (Xo) — 1 


and the tangent line at B is y= a + In(xo) — 1. At x = xo + | then, we see that 
DG= sot! + In(xo) — 1 = In(xo) + 2 So, the areas of the two trapezoids are: 


1 
xo t+ 1 


1/2 
xo tl 


area of AECD= 5] {In(xo + 1) \ + {In(xo 4 »)| = In(xo + 1) — 


and 


area of ABGD = 3 [ In(xo)} + {tn(x0) e =} =e ue 


The average of these two trapezoidal areas is 


> 


ee ee 1 
Xo +1 In(xo) 2 2 tmx) In(xo 1) 3(z aH) 


and since each trapezoidal area individually exceeds the area under the curve from 
X = Xg to X = Xq + I, then obviously so does their average. 
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If we now sum over all the average trapezoidal areas as x varies from | to 2, 2 to 
3, 3 to 4 and so on up to N — | toN, then we can write 


[ In(x)ax < 5 [In(1) + In(2) +3 (F — 5)] +3 [in(2) + im(3) + 5 (5 - 3)| 


[in(3) + In(4) + , G = )| 4: 
1 
2 


+ [in(N — 1) + In(N) +5 ( x)| 


or, 


+a[(1-3)+G-3)+G-a) +--+ Gawd) 


= Inf(2)(3)(4)...(N=1)] + In(N) = 5 In(N) + 5 [I - 5] 


or, 


.N 
1 1 1 
YY eee ee 
/ In(x)dx < In(N!) 5 In(N) + 4” AN’ (C.2) 
Looking back at (C.1), we see that 
" 1 
| In(x)dx = Cy + In(N!) — 5 In(N) 
1 
and so (C.2) says 
Cy + n(n!) — A ncn) < n(n) — dn(ny +2 — 
: 2 “12 4 4N 
or 
1 1 
Cy < 4 4N 
or, as N > 2, we can write 
1 
Cn <=. (C.3) 
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Equation (C.3) tells us that while Cy increases monotonically with increasing N 
(remember that?), it is also true that Cy is bounded from above. It should be 
intuitively clear, then, that Cx approaches a limiting value that we’ll call C: 
vim Cn =C. 


If we return to (C.1) and actually do the integral there, we have 


Cy = {xln(x) — x}! — In(N!) +5 In(N) =Nin(N) —N +1 — In(N!) +5, In(N) 


1 
2 


= (+2) incny — n(n) —N41=n( 8") wy 
(N+5) : 


and so, assuming N is sufficiently large that we can write C in place of Cy, we have 


NN4 ‘ig NSH 
c= in“) ~ ne )+1=In( Say) +1 


and so 
1-—C=-In (Re) 

Now, write 

anet-cne CH) _ (NY (C.4) 
That is, 

NleN N! 
a NSH NN EON 

or, 

N! ‘behaves like’ aNN*2e~N as Nx. (C.5) 


Equation (C.5) was known to De Moivre by 1730, but he didn’t know the exact value 
of a. (As it stands, as given by (C.3) and (C.4), all we know at this point is that 
el-4 <a<e!~°=e. In other words, a is somewhere in the interval 
2.117 < a < 2.718.) And then De Moivre’s friend, the Scottish mathematician 
James Stirling (1692-1770), calculated a= /2n = 2.5066—and that was enough 
to get the entire expression named after him, even though De Moivre had already 


done everything but find the exact value of a. 
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Life in the math world isn’t always fair.” 

So, how to determine the value of a? 

As shown in Eq. (B.6) of Appendix B, we know from Wallis’ formula that as N 
goes off to infinity 


jee NN) ek 
N00 (2N+1){(2N)!}? 2 


and so, using (C.5) to write the factorials, we have 


24 fate)" 
1 . 
== lim 


—F oe. eat 2 
BNR oie: 1) {a(2ny"*4e-2n 


which, if you do all the cancellations (you should do this) reduces to 


or, 


a=v2n. 


So, at last, we have the misnamed Stirling’s formula: 


NI ~ V2nNNtHe-N, (C.6) 


You'll notice that in (C.6) Pve used the ~ sign and not an = sign. That’s because 
Stirling’s formula is an asymptotic formula, not an equality. That is, as N — oo, 
(C.6) does indeed become an ever-better estimate of N! in the sense that the relative 
error tends to zero, but in fact the absolute error increases without bound (Stirling’s 
formula underestimates N!). Figure C3 shows this, along with the fact that, even for 
small values of N, Stirling’s formula is remarkably good. 

Of Stirling’s formula, one mathematician enthusiastically (and correctly) wrote 
“This formula is remarkable because it provides an approximation of N! that consists 
of a non-integral expression involving the irrational numbers e and z. In this respect, 


>The English mathematician Karl Pearson had some interesting commentary on this (see note 4 in 
Chap. 1). 
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N! N! = Stirling’s estimate 
1,04220712 


2 
: 
24 1.01678399 
120 1.01397285 


5,040 1.01046565 


40,320 1.00929843 
362,880 1,00836536 
3,628,800 1.00760243 


Fig. C3 Stirling’s formula is a really good approximation 


“tt 


to somebody seeing it for the first time, it must be just as amazing as Euler’s identity 
linking the numbers —1, i, e, and 1 via the equation e” = —1. © 

Stirling’s formula is immensely useful in problems involving the factorials of 
large numbers (with the result being even larger numbers). Here’s an example of 
that.’ Suppose you are asked to estimate the number of fish in a large lake (this is a 
question that might arise in a wildlife preservation study). One actual method used is 
quite clever. First, capture 1000 fish, then tag each of them with a dab of red paint, 
and then release the fish back into the lake. Next, wait a few days to allow the fish to 
spread uniformly throughout the lake. Finally, again capture 1000 fish and see how 
many of them were in the first capture (how many have a dab of red paint). Suppose, 
to be explicit, that of the 1000 fish in the second capture 100 were in the first capture. 
What is the probability of that occurring, assuming there are N fish in the lake? It 
should be physically clear that the smallest possible value for N is 1900. 

The answer depends, of course, on the value of N, and so we are interested in the 
value of N that maximizes the probability of there being 100 fish in the second 
capture that were also in the first capture. If we call that probability P, then (recalling 
Appendix A) 


1000) /N— 1000 
p= (roo) ( 900 ) 
(:000) 
because () is the number of ways to get 100 tagged fish on the second capture 


from the 1000 previously tagged fish, tg a) is the number of ways to get the 


°Finbarr Holland, “A Leisurely Elementary Treatment of Stirling’s Formula,” Irish Mathematical 
Society Bulletin, Summer 2016, pp. 35-43. Professor Holland is an emeritus member of the 
mathematics faculty of the School of Mathematics at University College Cork (Ireland). We'll 
encounter Professor Holland again, in Chap. 14, when we discuss his particularly clever derivation 
of the probability integral. 

7T first came across this problem when reading the classic book by Princeton University mathe- 
matics professor William Feller (1906-1970): An Introduction to Probability Theory and Its 
Applications (3rd edition, volume 1), John Wiley & Sons 1968, pp. 45-46. 
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900 untagged fish in the second capture from all the untagged fish in the lake, and 
(ana is the number of ways to get 1000 fish on the second capture from ail the fish 
in the lake, tagged and untagged. Expanding the binomial coefficients, this gives 


_ _(1000!)?_, {(N— 1000)!}? 
~ 100!(900!)? NI(N — 1900)! ° 


(C.7) 


So, ‘all’ we have to do is calculate P for all values of N from 1900 to, say, 100,000 
and observe which N gives the largest P. Well, of course you can appreciate the 
difficulty that is going to cause! The trick to getting around the computational 
difficulty is to not directly calculate factorials, but rather to use Stirling’s formula 
to calculate the /Jogarithms of the factorials. Since we know, for m a positive integer, 
that m! ~ /2am™*e~™ then 


In(m!) ~ ; In(2x) + (m + 5) In(m) — m. (C.8) 
This results in In(P) being simply the sum of the natural logarithms that are found by 
applying (C.8) to the various values of m in (C.7): m = 1000, 900, 100, N, N — 
1900, and N — 1000. 

With the aid of a computer this is easy to do, and Fig. C4 shows the result as N 
varies from 1900 to 100,000. The plot is log-log, since both P and N vary over wide 
ranges. (All the arithmetic is done using the natural logarithms of (C.8), but 


-10° 


-10'F T 7 


log, (P) 


-107 F 


-10° LL 
10° 10 
N 


4 10° 


Fig. C4 Probability 100 tagged fish from the first capture of 1000 fish are also in the second 
capture of 1,000 fish, as a function of N (the total number of fish in the lake) 
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converted to base-10 logarithms for Fig. C4.) For example, for the assumption that 
N = 1900 fish in the lake we see that logjo(P) = —429 (P = 10 *”°, a very small 
probability®). The value of logio(P) peaks at N = 10,000, and that value is called the 
maximum likelihood value of N. Of course, the actual probability for N = 10,000 
exactly is still small, and what makes more sense is to say something like ‘the actual 
value of N is ‘probably’ in the interval 8000 to 12,000.’ 


Appendix D: The Gamma Function 


In two letters written as 1729 turned into 1730, Euler introduced what is called the 
gamma function, I'(n), defined today in textbooks by the integral 


rin)= f ey? dx, n> 0, (D.1) 
0 


This definition is the modern one (equivalent to Euler’s’), dating from 1809, and is 
due to the French mathematician Adrien-Marie Legendre (1752-1833). Putting 
n = | into (D.1) it is an easy integration to see that 


ray= f e *dx={-e *}] =1. 
0 0 
Then, using integration-by-parts on 
T(n+ 1) =| e *x" dx, 
0 
where we define u = x" and dv = e * dx (and so du = nx" ‘dx and v = —e *), we 


get 


T'(n+ 1) ={-x"e~*} |p + fe nx"! dx =n f e-*a7—* dx. 
0 


That is, we have the so-called functional equation for the gamma function: 


®For the specific case of N = 1900 fish, Feller says the probability is “the order of magnitude 
108°,” in good agreement with Fig. C4. 


°EBuler’s original formulation was n! = TiS x)}"dx, which he found after a somewhat ‘hand- 
wavy’ analysis that you can read about in P. J. an “Leonhard Euler’s Integral: a historical profile 
of the gamma function,” American Mathematical Monthly, December 1959, pp. 849-869. 
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T(n+ 1) =nf(n). (D.2) 


So, in particular, for n a positive integer, 


and so on. In general, you can see that for positive integer values of n 
T(n+ 1) =n! 
or, equivalently, 
T(n) =(n—1)!,n>1, (D.3) 


Notice that substituting n = 1 into (D.3) says (1) = 0! = 1. Notice, too, that there is 
no requirement in the integral definition (D.1) that n has to be an integer. The gamma 
function is intimately related to the factorial function (and this connection was, in 
fact, the original motivation for Euler to define '(n) as he did, as a way to extend the 
factorial concept from applying to just the positive integers to covering all the real 
numbers, particularly the non-integer and negative reals). 

Changing variable in (D.1) to x = y (and so dx = 2y dy) we have 


r(n)= f ey" 29y dy=2f ey"! dy, 
0 0 


We get another true equation if we replace n with m, and the dummy integration 
variable y with the dummy integration variable x, and so 


T'(m) =2f eo xml dy, (D.4) 
0 
Thus, 


T'(m)I'(n) = / emit | e~Yy2—! dy 
0 0 


= 4/ | e (x+y?) ,2m in ldx dy. 
0 0 


This double integral may look pretty awful at first, but the trick that unwinds it is to 
switch from Cartesian coordinates to polar coordinates. That is, we’ll write 
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1’ = x* + y” where x = rcos (0) and y =r sin (0), and the differential area patch dx dy 


in rectangular coordinates transforms to the differential area patch r dr dO in polar 
coordinates.’ When we integrate over the region 0 < x, y < 00 we are integrating 
over the entire first quadrant of the plane, and in polar coordinates that region is 
defined by 0 <r < co and 0<0< 5. So, 


e4 
2 


T(m)[(n) = 4/ 


i e~' {r cos(0)}°"~ '{r sin(0)}"~' r dr do 
Jo Jo 


or, 
rimr(a)= [2 i Prot] 2 | “cos2"—1(@) sin2"-"(Q)d0]. (D.5) 
0 0 


Now, examine the first integral in square brackets on the right in (D.5). If you 
compare 


lo.@) 
2f e —1? .2(m+n) = ldr 
0 


to (D.4), you see that it is the same if we associate x + r and m «+ (m +n). Making 
those replacements, the first square-bracket term in (D.5) becomes 


2f oe? 20) — 1dr =T(m + n). 
0 


Thus, 


T(m)P'(n) =(m + n) 2 i cos“ ~1(9) sin 2"— '0)a0] (D.6) 
For example, with m=n= 5 in (D.6) we have 
r@r@)="@) rp [9] =r 2G)] =r 


or, as (1) = 1, 


'©You’ll see more about this trick in Chaps. 6 and 10. When transforming from one coordinate 
system to another, one generally has to compute something called the Jacobian determinant—after 
the German mathematician Carl Jacobi (1804—1851)—but here, in going from rectangular to polar, 
Iam counting on my assumed reader’s understanding from high school plane geometry, trigonom- 
etry, and AP-calculus of how those two particular systems are related. 
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r (5) =. (D.7) 


One can say a Jot more about the gamma function,'' but this is all we’ll need in this 
book for our discussions of the probability integral. 


Appendix E: Leibniz’s Formula for Differentiating 
an Integral 


Suppose we have the integral 


where « is the so-called parameter of the integral (the dummy variable of integration 
which is, of course, x, is not a parameter), and we wish to calculate the derivative of I 
with respect to a. We do that in just the way you might expect, from the very 
definition of the derivative: 


dl. = (a + Aa) — I(x) 
da Aa j 

Now, since the integration limits a and b depend (in general) on a, then a Aw will 
cause a Aa and a Ab and so we have to write the numerator as 


iether) i, gar eee ee [i oa 


a+Aa 


b+Ab atAa b 
=| f(x, a + Ac) ax— | f(x, + Aa) ax— | f(x, «) dx 
: bab . 
=| f(x, a + Ac) a+ | f(x,a + Aa) dx 
: a+Aa : b 
-{ f(x, a + Aq) ax | f(x, a) dx 
b : b+ Ab 
-{ {f(a + Aa) ~ F(x, a)}ex+ f f(x, a+ Aa) dx 


a+Aa 
— / f(x, a + Aa) dx. 


I See, for example, the short classic by Emil Artin, The Gamma Function, Dover 2015. 
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As Aa — 0 we have Aa — 0 and Ab — 0, too, and so 
b 
lim {Ha + Aa) —I(a)} = sim, | {f(x, a + Aa) — f(x, a) }dx + f(b, «)Ab — f(a, x) Aa 


where the last two terms follow because as Aa — 0 and Ab — 0 the value of x over 
the entire integration interval remains practically unchanged at x = a or at x = b, 
respectively. Thus, 


dl . I(a+ Aa) —I(a) 
da as Aa 
= lim a * FEC a+ Aa) —f(x,a)}dx+ lim f(b Fie 
Aa—0 Aa J, , > Aa—0 \” ° Aa 


F Aa 
7 lim fla, a) Aa 


or, taking the a inside the integral (the Riemann integral itself is defined as a limit, 
so what we are doing is reversing the order of two limiting operations, something a 
pure mathematician would want to justify), we have 


b 
Of db d 
auf 5, dx + F(b,0) 5° — f(a.a) 5. (E.1) 


(E.1) is the full-blown Leibniz formula—after the German mathematician Gottfried 
Leibniz (1646—1716)—for how to differentiate an integral, including the case where 
the integrand and the limits are functions of the parameter a. If the limits are not 
functions of the parameter (such as when the limits are constants) then the last two 
terms vanish and we simply (partially) differentiate the integrand under the integral 
sign with respect to the parameter « (not x). 

As a simple example of the use of Leibniz’s formula, let’s calculate the value of 
fi In(x)dx. This is a pretty straightforward calculation if you recall the first-year 
calculus formula for the indefinite integral: 


/ In(x)dx =x In(x) —x 


which is easily verified by simply differentiating the right-hand-side. This tells us the 
answer to our question is 


i In(x)dx = {x In(x) —x}} ={In(1) — 1} ~ limx In(x) = —1 


1 
0 
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because limx In(x) =0.'* So, does Leibniz’s formula agree? Here’s how to see that 


x 
it does. 
Leibniz’s rule requires a parameter with which to differentiate with respect to, and 
so there is our first issue—where is the parameter in is In(x)dx? Well, there isn’t 
one—-so let’s just put one in! By that I mean let’s study, instead, the integral 


1 
I(t) = [ x'dx, 
0 


which appears to be unrelated to our original integral. This is, in fact, one of the 
features of Leibniz’s rule that is the hardest to learn to work with (the identification 
of how to introduce a parameter). This requires experience, skill, and (to be honest) 
often some luck. This is what takes it to a level above that of being a mere rote 
application. In any case, let’s continue by writing 


\ 1 
I(t) -{ eax= | et MO dx, 
0 0 


Then, with t as the parameter, 


dl T r 
= =| In(x)e' mshi — f x' In(x)dx 
dt F 


0 


and we see that at ,—o 1S Our original integral. That is, 


1 
dl 
| In(x)dx = a 


=0 


t=0 


But computing a is easy, because 
1 tt |! 1 
I(t) =| xdx= a 

0 El), - ta 
and so 

dI _ 1 

dt (E41) 
'? This limit is made obvious by writing x = 4and so lim. x In(x) = lim, 4 In(t) = - Jim, mtn) =0 


because log(n) grows (much) more slowly than does n. 
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From this we have 


1 
1 

= |t=-0= In(x)dx = 
7 &) (t+ 1) 


just as we calculated at the start of this example. 

As a second, more elaborate example of the Leibniz formula, here’s how it 
evaluates an integral by finding a differential equation for which the integral is the 
solution (even if you have never seen a differential equation before, everything you 
need to know is in this appendix). Consider, as did Laplace in 1810, the evaluation of 


I(a) = 7 w Sslea a (E.2) 


= 1, 


t=0 


where a and b are each positive (we'll take a as the parameter and treat b as a 
constant). If we integrate by parts, writing 


1 
u= —,,dv= cos(ax)dx 
x2 + b? oe 
then 
re 2x ies sin(ax) 
(e+e 
and so 


i sin(ax) 7 gp gee x sin(ax) . 
I )={ se +5 f (x2 +b?) d 


or, as the first term on the right vanishes at both the upper and the lower limits, we 
have 


I(a) = ees dx (E.3) 


a +b?) 


and so it perhaps looks as though we are making things worse! As you’ll soon see, 
we are not. 
Next, multiply through (E.3) by a to arrive at 


al(a) 20" ey dx (E.4) 
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and then differentiate with respect to a to get!* 


oo y2 
a dlla) i= 2 x costa) a 
da 0 (x?+b’) 


The integrand can be re-written in a partial fraction expansion: 


x? cos(ax) _ cos(ax) —_b* cos(ax) 


(x24b7)? +b? (x2 402)? 


Thus, 


af x2 este ix=2f~ ae ax—2y f cos(ax) , dx 
0 (x? + b’) 0 x7+b 0 (x? + b’) 


and, since the first integral on right is I(a), we have 


dI(a) a) = 21a) — 2b? *  cos(ax) “ 
a9 + (a) =21(a) 26? | ee 
or, 
alla) _ a) = —2b? °  cos(ax) . 
7, iG 20? | a (E.5) 


Differentiating (E.5) with respect to a, we get 


ot) , dla) dI(a) = 20 [" x sin(ax) - 
m { 


da? da da gi b?)” 
or, 
dI(a ° x sin(ax 
so) =20° | Ba (E.6) 
da 0 ( x24 b’) 
'3To do this we are assuming at the upper limit that a) = 0, and this assertion is open to serious 


objection! Let’s ignore that issue for now, but when we get to Chap. 10 Pll say more about it. 
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If you look back at (E.4), you’ll see that we found the integral on the right in (E.6) to 
be 


x2 +b)" 
and so 
d?I(a) 2a 


or, rearranging, we have the following second-order, linear differential equation for 


I(a): 


d’I(a) 
da? 


—b? I(a) =0. (E.7) 


Such equations are well-known to have exponential solutions—I(a) = Ce‘*, where C 
and k are constants—and substitution into (E.7) gives 


Ck’e — b*Ce“*# =0 


and so k? — b”? = 0 ork = +b. Thus, the general solution to the differential equation 
is the sum of these two particular solutions: 


I(a) =Cye” + Coe. (E.8) 


We need two conditions on I(a) to determine the two constants C, and C2 , and we 
can get them from our two different expressions for I(a): the original (E.1), 


Appendices 153 


and from (E.3) we see that lim I(a) =0. 
Thus, from (E.8) we have 


and 


We’ll do this particular integral again, two more times, in Appendix F (as a contour 
integral) and Appendix G (using Fourier transforms), to illustrate those new tech- 
niques for evaluating the probability integral. 

Leibniz’s formula has, in recent years, had the label ‘Feynman’s trick’ attached to 
it—after the American physicist Richard Feynman (mentioned in the Preface) who 
popularized its use—but mathematicians were aware of the technique long before 
Feynman’s great-great grandparents were born. Feynman, himself, often told the 
story of how he had learned the ‘trick’ as a teenager from a college math book given 
to him by his high school physics teacher.'* 


Appendix F: Contour Integration in the Complex Plane 


F1: Before We Start 


As I start this essay (the most extensive appendix in the book) on contour integration, 
Pll assume only that you are familiar with complex numbers and their manipulation. 
The first several sections will lay the theoretical groundwork and then, quite sud- 
denly, you’ll see how they all come together to give us the beautiful and powerful 
technique of contour integration. Unlike the previous appendices, this one is suffi- 
ciently long that I have divided it into sections. None of these sections is individually 
very difficult, with all being at the freshman calculus level, but each is absolutely 
essential for understanding Chap. 12. 


'4For more on the ‘Feynman’ trick’ legend, see my book Inside Interesting Integrals (2nd edition), 
Springer 2020, pp. 52-55. 
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F2: Line Integrals 


Imagine two points, A and B, in the two-dimensional x,y plane. Further, imagine that 
A and B are the two end-points of the curve C in the plane, as shown in Fig. F2.1. 
A is the starting end-point and B is the terminating end-point. Next, suppose that we 
divide C into n parts (or arcs), with the k-th arc having length As, (where k runs from 
1 ton). Each of these arcs has a projection on the x-axis, where we’ll write Ax, as the 
x-axis projection of As,. In the same way, we’ll write Ay, as the y-axis projection of 
As,. Finally, we’ll assume, as n — oo, that As, — 0, that Ax, — 0, and that 
Ay, — 0, for each and every k (that is, the points along C that divide C into n arcs are 
distributed, loosely speaking, ‘uniformly’ along C). 

Aside: Soon, we’ll call the x and y axes the real and the imaginary axis, 
respectively, of the complex plane, with the y-axis measured in units of the imag- 
inary i= V —1. This will all be elaborated on in the following sections. 

Continuing, suppose that we have some function h(x,y) that is defined at every 
point along C. If we form the two sums )7y_ (xx, y,)AXk and S7y_ (Xx, ¥,) AY, 
where (Xx, Yx) is an arbitrary point in the arc As,, then we’ll write the limiting values 
of these sums as 


tim So, he, yn) AX = 2 h(x, y)dx =I (F2.1) 
and 
sim J) nO) = f h(xy)dy =I. (F2.2) 


AYK 
e 


Ay 


Fig. F2.1 A curve in the plane, and its projections on the x and y axes 
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Fig. F2.2. Two different Ya 
line integral paths 
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The C’s at the bottom of the integral signs in (F2.1) and (F2.2) are there to indicate 
that we are integrating from A to B along C. We’ll call the two limits in (F2.1) and 
(F2.2) line integrals (sometimes the term path integral is used, commonly by 
physicists). If A = B (that is, if C is a closed loop'”) then the result is called a 
contour integral. When we encounter contour integrals it is understood that C never 
crosses itself (such a C is said to be simple). Further, it is understood that a contour 
integral is done in the counter-clockwise sense; to reverse the direction of integration 
will reverse the algebraic sign of the integral. 

The value of a line integral depends, in general, on the coordinates of A and B, the 
function h(x, y), and on the specific path C that connects A and B. For example, 
suppose that A = (0, 0), B = (1, 1), and that h(x, y) = xy. To start, let’s suppose that 
C=C, is the broken path shown in Fig. F2.2. The first part is along the x-axis from 
x = 0 to x = 1, and then the second part is straight-up from x = 1 (y = 0) tox = 1 
(y = 1). So, for this path we have y = 0 along the x-axis (and so h(x, y) = 0), and 
X = | on the vertical portion of C; (and so h(x, y) = y). Thus, our line integral on this 
path is 


1 1 
=f hixy)dx= [ vax f y dx=0+0=0 
Cy 0 1 


and 


'S There are, of course, two distinct ways we can have A = B. The trivial way is if C simply has zero 
length, which immediately says I, = I, = 0. The non-trivial way is if C goes from A out into the 
plane, wanders around for a while, and then returns to A (which we re-label as B). It is this second 
way that gives us a closed loop. 
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0 1 1 
1 
= [ hx.y)dy= [ oay+ [ y dy = (5y") 
Ci 0 0 


0 


1 


Along the path C), on the other hand, we have y = x from A to B, and so h(x, y) = x” 
(or, equivalently, y*). So, on this path the line integrals are 


1 1 
1 1 
b= f hixy &x= [x dx= =x? =z 
j h(xy)dx= fx? dx=(3x°)) = 3 

and 

@ Lai: 2 
Lf hix.y)dy= [ y dy= (3°) =a 

ron 0 0 


Clearly, the values of the I,, I, line integrals are path-dependent and, for a given path, 
the I,, I, line integrals may or may not be equal. We can combine the I, and I, line 
integrals to write the line integral along C as Ic = I, + ily, and so Ic, = is while 
Iq, = § +45 

Looking back at the previous section, notice that in Fig. F2.2 we could write the 
unbroken line segment AB as z = x + iy or, since y = xX, Z =x + ix = x(1 +7) and so 
dz = (1 + idx. Then, as h(x, y) = h(x, x) = x’, we have 


ae 
- 3 


Ic, = fra + idx =(1 +i) Gx) 3 


0 


which is just as we calculated before. 

For now, we’ll put aside these considerations and turn to expanding this discus- 
sion from functions of a real variable to functions of a complex variable. Soon, 
however, you'll see how this expanded view of functions will ‘circle back’—how 
appropriate!—to closed contour line integrals, and what we’ve done in this section 
will prove to be most useful. 


F3: Functions of a Complex Variable 


We'll write the complex variable z as 
z=x+iy (F3.1) 


where x and y are each real with each varying over the doubly-infinite interval —oo 
to +oo, and i= V/V —1. Geometrically, we’ll interpret z as a point in an infinite, 
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two-dimensional plane (called the complex plane) with x measured along a horizon- 
tal axis and y measured along a vertical axis. And we’ll write a complex function of 
the complex variable z as 


f(z) =f(x + iy) =u(x, y) + iv(x, y) (F3.2) 


where u and v are each real-valued functions of the two real-valued variables x and 
y. For example, if 


f(z) =z? =(x + iy)’ =x? —y? + i2xy 
then, in this case, u = x” — y* and v = 2xy. In x, y notation, we are said to be 
working in rectangular coordinates. 


F4: The Cauchy-Riemann Equations and Analytic Functions 


Complex function theory really starts with the study of what it means to talk of the 
derivative of f(z). In real function theory, the derivative of g(x) at x = xo is defined as 


g(Xo + Ax) — g(Xo) 


dg ag 
aS. AX et = (Xo). 


dx 


X=Xo ~ Ax 0 


We do almost the same thing with a complex function. Indeed, the formal definition 
for the derivative of a complex f(z) at z = Zo is 


df 


safes f(Zo + Az) —f(zo) 
dz 


Z=Z Az—0 Az 


=f’ (Zo). 


The vanishing of Az = Ax + iAy is, however, not quite as straightforward as it is in 
the case of a real variable. In that simpler case, where we let Ax — 0 to calculate 
g (Xo), Ax only has to vanish along the one-dimensional real axis. That is, Ax can 
shrink to zero in just two ways: either from the left of xp or from the right of xp. But 
in the complex case we must take into account that, since Zo is a point in the complex, 
two-dimensional plane, then Az can shrink to zero in an infinity of different ways 
(from the left of zp, from the right of zo, from below Zp, from above Zo or, indeed, 
from any direction of the compass). So, just how does Az — 0? 

Mathematicians consider the most condition-free definition possible for the 
derivative to be the best definition, and so their answer to our question is: we want 
f(z) to be the same independent of how Az — 0. To have this be the case, as you 
might suspect, comes with a price. If f = u + iv then the price for a derivative at 
Z = Zp that doesn’t depend on the precise nature of how Az — 0 is that u and v cannot 
be just any functions of x and y, but rather must satisfy certain conditions. If these 
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conditions are satisfied at z = Zp and at all points in a region (domain or neighbor- 
hood are terms that are also used) surrounding Zo, then we say that f(z) is an analytic 
function in that region. 

The conditions for f(z) to be analytic are called the Cauchy-Riemann (C-R) 
equations, which are actually pretty easy to state: at z = Zp it must be true that 


Ou Ov 
ax = By (F4.1) 
and 
ou Ov 


For example, suppose that f(z) = z. That is, f(x, y) = x + iy = u(x, y) + i v(x, y) which 
means that u(x, y) = x and that v(x, y) = y. Then, 


Ou Ou Ov Ov 
Ox leas Ox 0 ay : 


and we see that the C-R equations are satisfied. Indeed, since the C-R equations are 
independent of z (of zo) then f(z) = z is analytic over the entire finite complex 
plane.'° As a counter-example, of a f(z) that is nowhere analytic, consider 
f(z) =Z=x — iy, where Z is the conjugate of z. Then, 


ou Ou Ov Ov 
Ox 


and so (F4.1) is never satisfied. 

Under not particularly harsh requirements the C-R equations are necessary and 
sufficient conditions for f(z) to be analytic. To show that the C-R equations are 
necessary is not at all difficult. Since Az = Ax + iAy then to have Az — 0 requires 
that both Ax — 0 and Ay — 0. That is, to speak of the derivative of f(z) at z = zo 
means to calculate 


: f(xo + AX, yo + Ay) — f(x0, Yo) 
/ — 
‘ (zo) a hese eek Ax+ iAy 


‘©The word finite is important: f(z) = z blows-up as |z| — 00 and so f(z) is not said to be analytic at 
infinity. In fact, there is a theorem in complex function theory that says the only functions that are 
analytic over the entire complex plane, even at infinity, are constants. For constants, all four partial 
derivatives in the C-R equations are identically zero. 
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Now, out of the infinity of ways that both Ax and Ay can vanish, let’s consider just 
two. First, assume that Ay = 0 and so Az = Ax. That is, z approaches Zg parallel to 
the x-axis. Second, assume that Ax = 0 and so Az = iAy. That is, z approaches Zo 
parallel to the y-axis. If f(z) is to be unique, independent of the details of how 
Az — 0, then these two particular cases must give the same result. In the first case we 
have 


; _ : 
(20) = eee Ax 
= im {UC + AR Yo) +i ¥(x0 + Ax,¥o)} — {u(Ko,¥o) +7 VO%0.¥o)} 
Ax—>0 Ax 
— im {UC + AX, ¥o) —UlH0, Yo)} +i {v(x0 + AX Yo) —VO%0o)} _ Ou, ,dv 
im tin. 
Ax—>0 Ax Ox Ox 


And in the second case we have 


£(xo, Yo + Ay)— £(Xo; Yo) 


J 
P(%) = ieee ae iAy 
— jim L%oYo + Ay) + i v(X0, Yo + Ay) } = {ulXo, Yo) +4 ¥(Xo, Yo) } 
Ay—>0 iAy 
= jim 1L2(%o. Yo + Ay) — u(%o, Yo) } +t {v(Xo, Yo + Ay) = V(X, Yo)} _ Lou, Ov _ Ov du 
Ay>0 iAy idy Oy Oy dy 


Equating the real and the imaginary parts of these two expressions for f (zo) gives the 
C-R equations in (F4.1) and (F4.2). 
Analytic functions are clearly a rather special subset of all possible complex 
functions, but certain broad classes are included. They are: 
1. Every polynomial of z is analytic; 
2. Every sum and product of two analytic functions is analytic; 
3. Every quotient of two analytic functions is analytic except at those values where 
the denominator function is zero; 
4. An analytic function of an analytic function is analytic. 


So, from (1) f(z) = Zz” and f(z) = e” are both analytic, because in the first case f(z) is a 
polynomial and in the second case because the exponential can be expanded in a 
power series. From (2) f(z) = ze" is analytic, and from (3) f(z) = e7/(z" + 1) is 
analytic except at z = + i which are called the singularities of f(z) because, at those 
values of z, f(z) blows-up. And finally, from (4) f(z) = e® is analytic. 


F5: Green’s Integral Theorem 


In this section we’ll continue our the discussion of line integrals to derive what is 
called Green’s theorem.'’ We begin by imagining a closed path (contour) C that 


'7For the interesting history of this theorem, named after the English mathematician George Green 
(1793-1841), see my An Imaginary Tale, Princeton 2016, pp. 204-205. 
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Fig. F5.1_ A CCW contour Cc 
C and its interior R 


encloses a region R of the complex plane, as shown in Fig. F5.1. We further imagine 
that there are two real functions of the real variables x and y, P(x, y) and Q(x, y), 
defined at every point along C and in the region R (the interior of C). Then, Green’s 
theorem says that 


Fo{P(s.y) dx-+Q(s.y) dy} = ff {S2— Flax ay. (5.1 


The circle on the line integral in the left-hand side of (F5.1) is there to emphasize that 
C is a closed, non-self-intersecting path (a simple curve traversed in the CCW sense, 
as mentioned in Sect. F2). R is called a simply connected region, which means every 
simple closed curve in R encloses only points in R. If a region is not simply 
connected then it is said to be multiply-connected: an example is a simply connected 
region that has a hole cut in it. The points ‘in the hole’ are considered to be in the 
exterior of C. 

Green’s theorem relates a contour integral along C to an area integral over the 
interior of C. For the contour of Fig. F5.1 it’s pretty obvious where the interior of C 
is, but in more advanced work than we are doing in this book one encounters 
contours whose interiors won’t be so obvious. Here’s an easy, low-level way to 
always locate the interior of a C: as you walk along C in the CCW sense, imagine 
you drag both hands along the ground. Your /eft hand will be in the interior, while 
your right hand will be in the exterior of C. (The idea that a simple, closed curve 
divides the plane into two regions—its interior and its exterior—seems pretty 
obvious. But not so obvious that mathematicians have nonetheless felt the need to 
term it the Jordan Curve Theorem, after the French mathematician Camille Jordan 
(1838-1922), who stated it in 1887.) 

To prove Green’s theorem isn’t difficult, or at least it isn’t if we make some highly 
simplifying assumptions. These assumptions are actually not required, but to remove 
them complicates the proof. To start, our first assumption is that R is a rectangular 
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Fig. F5.2 When R is a y4 
rectangle 
C; 
Y1 @------- < < 
¥ A 
Cy R Cy 
Vv A 
YO @-+------ > > 
Cy 
1 1 
| | 
| | 
e ad © > 
XQ X] ‘i 


patch oriented parallel to the x and y axes, as shown in Fig. F5.2. (I’ve drawn the 
patch totally in the first quadrant, but that’s just the way I drew it—in all that follows 
the precise location of R is irrelevant.) The boundary edge of Ris C=C, +C2+C34+Cy, 
which simply means that C is made of four sides. 

Starting with the i) = oP dx dy term on the right-hand side of Green’s theorem, 


we have 


op, ef par 
[eae ff i sesh 
= / {P(x,y,) —P(x yo) }dx = i P(x, yo)dx + / P(x, y,)dx 
=| P(x, y) ax+ | P(x, y) dx. 
Cj 


C3 


Notice, carefully, that in the last two integrals I have dropped the subscripts on yo 
and y,, subscripts that were included in the earlier integrals. I can do that because the 
subscripts were originally there to distinguish between integrating along the lower 
edge (yo) or along the upper edge (y,) of R, and that job is now done in the last two 
integrals by writing C, (the lower edge) and C3 (the upper edge) beneath the 
appropriate integral sign. Notice, too, that writing i dy = P(x, y,) — P(x, yo) 
makes the assumption that there is no discontinuity in & , that is, the partial 
derivative is continuous. 

Similar integrals with respect to x can be written for the other two edges (C> and 


C4) as well and, since those are vertical edges, we know that everywhere along them 
dx = 0. That is, 
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and 


Xo 
[ P(x, y) dx =| P(x, y)dx =0. 
C4 xo 
Since those integrals vanish we can formally add them to our C, and C; integrals 
without changing anything. So, 


If.- sex ay= [PCs y) dx + [ro dx + [P0. y) dx +f P(x, y) dx = P(x, y) dx. 


Cy 


If you repeat all the above for the ff me dx dy term in Green’s theorem, and 
observe dy = 0 along the horizontal edges C; and C3, you should easily see that 


0 
J] so ex ey=F-O0.y) a, 
R 


and that completes the proof of Green’s theorem for our nicely oriented rectangle in 
Fig. F5.2. In fact, however, this proof can be extended rather easily to other much 
more complicated shapes for R. Using this same basic idea, we can build very 
complicated shapes out of appropriately arranged rectangles and, since Green’s 
theorem works for each rectangle, then it works for all of them together and so 
Green’s theorem works for their composite (and perhaps quite complicated) 
region R. 


F6: Cauchy’s First Integral Theorem” 


Well, it’s taken a bit to get to this point, but it will soon be clear it was worth the 
effort. Our basic result is easy to state: if f(z) is analytic everywhere on and inside C 
then 


'8By convention, the theorem in this section is named after Cauchy who published it in 1814, but 
in a letter dated December 1811, written by Gauss to his fellow German mathematician Bessel (who 
you will remember from note 3 in Chap. 7), he states (without proof) the theorem we will prove 
here. In mathematics, alas for Gauss (as if he really needed more to add to his enormous resumé), 
credit goes to the first to publish. 
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fttoae =0. (F6.1) 
Cc 


To show this, recall (F3.1) and (F3.2). That is, with f(z) = u(x, y) + 7 v(x, y) and 
writing dz = dx + i dy, we have 


frtarae= f (urivylar + say) = f (ude+ iu dy + iv dx—v dy) 
c Cc Cc 


or, 


ftle)ae= p(w ax—vdy) +i 6 (v dx + u dy). (F6.2) 


Now, because of Green’s theorem, the two contour integrals on the right are each 
equal to zero. To see this, consider the first integral on the right-hand side of (F6.2), 
and look back at (F5.1). You see that we have P(x, y) = u(x, y) and Q(x, y) = — v(x, 
y), and so the partial derivatives on the right-hand side of (F5.1) are 


OQ dv OP ou 


Ox Ox’ dy dy’ 


The C-R equation of (F4.2), which holds here because we are assuming f(z) is 
analytic, says the integrand of the double integral in Green’s theorem is 


OQ OP _ ov du__ dv Ov =" 
Ox Oy Ox dy Ox oxy 


For the second integral on the right-hand side of (F6.2) we have P(x, y) = v(x, y) and 
Q(x, y) = u(x, y). So now 


OQ du OP_ dv 
Ox Ox’ dy Oy 


and the C-R equation of (F4.1) says the integrand of the double integral in Green’s 
theorem is 


OQ OP du ov dv Ov _ 
Ox Oy Ox Oy dy dy > 


0 


because, again, f(z) is analytic. So, (F6.1) is proven. 
There is no denying that (F6.1) looks pretty benign. But it has tremendous power. 
For example, consider the case of 
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Fig. F6.1 A contour that YA 
avoids a singularity at the 
origin 
iT 
C3y 
3 G 
i€ 
Cy ey 
@ > > 
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which is analytic everywhere except at z = 0 because there f(z) blows-up. So, if we 
integrate f(z) around any C that avoids putting z = 0 in its interior, we know from 
(F6.1) that we’ll get zero for the integral. With that in mind, consider the contour C 
shown in Fig. F6.1, where ¢ > 0 and T is finite, and the two arcs are circular. In the 
notation of Fig. F6.1, we have 


frtoa=[reae+ [react | reae+ | teaz=o (F6.3) 


For each of the four segments of C, we can write: 


on Cy: z = x, dz = dx; 

on C3: z= Te”, dz = iTe”d0, 0<0< 53 
on C3: z =i y, dz =i dy; 

on C4: z = ee”, dz = ieed0, > 8 > 0; 


Thus, (F6.3) becomes 


T Aix 5 ,iTe® © Ai(iy) 0 Qiee® 
e iS 40 e ; e . ap 
| ox +f 78 iTe"d@ + | = idy +f =o iee™ dd =0. 


Then, doing all the obvious cancellations and reversing the direction of integration 
on the third and fourth integrals (and, of course, their algebraic signs, too), we arrive 
at 
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T Gix ay, io aid Te-y 
ax tif f (e" — eve Jeo} — f ©" dy=0. 
€ ze 0 € y 


If, in the last integral, we change the dummy variable of integration from y to x, we 
then have 


T oie gx Cry. a3 
/ Mae actif [ (e™" - cao} <0. (F6.4) 
€ 0 


Now, focus on the second integral and expand its integrand with Euler’s identity: 


0 ice! ; F i gi i : i si 
ele lee” — ell cos(®)+isin(8)} eet cos(0)+i sin(8) } 


26> Tsin(®) ,iTcos(8) ee sin(0) et cos(0) ; 


If we let T — oo and e — 0 then the first term on the right goes to zero because 


jim e Tin) — ( for all O<0< 5 


while the second term goes to 1 because 


lime" = jim e#°S) — e° = 1 for allO<0< z 
e—>0 20 2 


Thus, (F6.4) becomes, as T > oo and e — 0, 


| ie a are q ['c 1ya0 =i 
0 aS 0 


or, using Euler’s identity again, 


[ cos(x) + isin(x) —e7 dx j= =0 
0 x 2 


or, 


; cos(x) —e7 axc+if sin(x) dx =i. 
0 x 0 x 2 


Equating imaginary parts, we have 
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a famous (and very important) integral first evaluated by Euler (sometime between 
1776 and his death in 1783) in an entirely different way, while equating real parts 
gives the remarkable 


You can see that the ability of contour integration in the complex plane to do real 
integrals, integrals like iPad and [ “ee , depends on the proper choice of the contour 
C. At the start, C encloses a finite region of the plane, with part of C lying on the real 
axis. Then, as we let C expand so that the real axis portion expands to —oo to oo, or 
to 0 to oo, the other portions of our properly chosen C result in integrations that are, 
in some sense, “easy to do.’ 


F7: Cauchy’s Second Integral Theorem 


When we try to apply Cauchy’s first integral theorem we may find it is not possible 
to construct a useful contour C such that a portion of it lies along the real axis and yet 
does not have a singularity in its interior. The presence of singularities inside C 
means that Cauchy’s first integral theorem no longer applies. ‘Getting around’ (pun 
intended!) this complication leads us to Cauchy’s second integral theorem: if f(z) is 
analytic everywhere on and inside C then, if zp is inside C, 


f 

bo earn f(Zo). (F7.1) 
cZ—Zo 

By successively differentiating with respect to z) under the integral sign, it can be 

shown that all the derivatives of an analytic f(z) exist (we’ll use this observation in 

the next section): 


where Zp is any point inside C and f™ denotes the n-th derivative of f. 

While f(z) itself has no singularities (because it’s analytic) inside C, the integrand 
of (F7.1) does have a first-order singularity’? at z = zo. Now, before I prove (F7.1) 
let me show you a pretty application of it, so you’ll believe it will be well-worth your 


‘©The singularity in (7.1) is called first-order because it appears to the first power. By extension, 
has a second-order singularity, and so on. When we do the probability integral in Chap. 12, a 
Z-Zo 


single, first-order singularity is all we’ll encounter. 
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Fig. F7.1_ A contour yA 
enclosing a single, first- 
order singularity 


> > > xX 
-T T 


time and effort to understand the proof. What we’ll do is evaluate the contour 


integral 
iaz 
fst 
cb* + 22 


where C is the contour shown in Fig. F7.1, and a and b are each a positive constant. 
When we are nearly done, we’ll let T — oo and you’ll see we will have derived a 
famous result. Along the real axis part of C we have z = x, and along the 
semicircular arc we have z = Te, where 9 = 0 atx = Tand0 =z at x = — T. So, 


iaz T iax Tt ia(Te”) ; 
ae tee Mere orcs 
Cc Z —T x 0 Z 


The integrand of the contour integral can be written in a partial fraction expansion as 


elaz _ elaz _ elaz 1 1 
b?+22 (z+ib)(z—ib))  i2blz—ib z+ ib)’ 


and so we have 


| elaz ay elaz galic. AN elax ce mn T ela(Te”) atiag 
i2b | Jo z—ib czZ+ib | f_pb?4+ x2 9 b2 + T2ei20 , 


Since the integrand of the second contour integral on the left-hand side is analytic 
everywhere inside of C—that integrand does have a singularity, yes, but it’s at 
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z = — ib which is outside of C, as shown in Fig. F7.1—then we know from Cauchy’s 
first integral theorem that the second contour integral on the left-hand side is zero. 
And once T > b (remember, eventually we are going to let T — oo) then the 
singularity for the remaining contour integral on the left is inside C, at z = ib. Thus, 


1 eiaz T elax T ea Tel® x96 
2b p pare foe ax + f bp Tenn Te dé. 


The integrand of the contour integral on the left looks exactly like f(z)/(z — zo), with f 
(z) = e and zp = ib. Cauchy’s second integral theorem tells us that, if T > b, the 
contour integral is equal to 27i f(Zo), and so the left-hand side of the last equation is 
equal to 


= Qni el*\) 


a —ab 
2b = 


That is, 


T elax 1 eia(Te®) . . 
| ere* [ bp Tene te do= pe, T>b. 


Now, if we at last let T — oo then, making the same sort of argument that we did 


concerning the line integral along the circular arc in the previous section, we see that 
the second integral on the left vanishes like t And so, using Euler’s formula, we 


have 
_ oe — °° cos(ax) [- sin(ax) 
—+——- dx = =e = dx +i dx 
i b i 0 b* + x? 


Equating imaginary parts we arrive at 


[- sin(ax) oe, 


oo b* + x? 


which is surely no surprise since the integrand is an odd function of x. Equating real 
parts gives us the far more interesting 


[ cos(ax) dx — Zena 


oo D* + x? —b 


first done via a non-contour method in 1810 by Laplace. 

Okay, here’s how to see what’s behind Cauchy’s second integral theorem. The 
proof is beautifully elegant. In Fig. F7.2 I have drawn the contour C and, in its 
interior, marked the point zo. In addition, centered on Zp I’ve drawn a circle C* with a 
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Fig. F7.2. A simple curve C 
(enclosing point Zo) 
connected to an inner circle 
C* via the cross-cut ab 


radius p that is sufficiently small that C* lies completely in the interior of C. Now, 
imagine that, starting on C at some arbitrary point (call it A), we begin to travel along 
C in the positive (CCW) sense until we reach point a, whereupon we then travel 
inward to point b on C*. Once at point b we travel CW (that is, in the negative sense) 
along C* until we return to point b. We then travel back out to C along the same path 
on which we traveled inward until we return to point a. We then continue on along C 
in the CCW sense until we return to our starting point A. 

Here’s the first of two crucially important observations on what we’ve just done. 
The complete path we’ve followed has always kept the annular region between C 
and C* to our left. That is, this path is the edge of a region which does not contain the 
point Zp. So, in that annular region from which z = zo has been excluded by 
construction, f(z)/(z — Zo) is analytic everywhere. Thus, by Cauchy’s first integral 
theorem, since z = Zg is in the exterior of the path we’ve traveled on as we moved 
(starting) from A and ending back at A, we have 


¢ Ae) 9% (F7.2) 
Cc 


ab, —C*,ba % — 20 


The reason for writing —C* in the path description of the contour integral is that we 
went around C* in the negative sense. 

Here’s the second of our two crucially important observations. The two trips 
along the ab-connection between C and C* (mathematicians call this two-way 
connection a cross-cut) are in opposite directions and so cancel each other. That 
means we can write (F7.2) as 
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f. -c pone = é fe dz é fe dz. (F7.3) 


The reason for the minus sign in front of the C* contour integral at the far-right of 
(F7.3) is, again, because we went around C* in the negative sense. The two far-right 
integrals in (F7.3) are in the positive sense, however, and so the minus sign has been 
moved from the —C* path descriptor at the bottom of the integral sign to the front of 
the integral, itself. 

Now, while C is an arbitrary simple curve enclosing Zo, C* is a circle with radius p 
centered on Zo. So, on C* we can write z = Zp + pe” (which means dz = iped0) and, 
therefore, as 6 varies from 0 to 2x on our one complete trip around C*, (F7.3) 
becomes 


f f "f(z + pe), 7 
cZ— Zo cr Z—Zo 0 pe 0 


If the integral on the far left is to have a value then, whatever it is must be 
independent of p. After all, the integral at the far left has no p in it! So, the integral 
on the far right must be independent of p, too, even though it does have p in it. That 
means we must be able to use any value of p we wish. So, let’s use a value for p that 
is convenient. 

In particular, let’s use a very small value, indeed one so small as to make the 
difference between f(z) and f(z), for all z on C*, as small as we like. We can do this 
because f(z) is assumed to be analytic, and so has a derivative everywhere inside C 
(including at z = zo), and so is certainly continuous there. Thus, as p — O we can 
argue f(z) — f(z) all along C* and thus 


¢ Jel dz=i | (29). 


Finally, pulling the constant f(zp) out of the integral, we have 


2n 
¢ f(z) de =i t(eo) [ d0 = 2ni f(Z0), 


Z— Zo 0 


which is (F7.1) and our proof of Cauchy’s second integral theorem is done. 


F8: Singularities and the Residue Theorem 


In this section Pll derive the wonderful residue theorem, which will reduce the 
probability integral to ‘routine’ status. We start with a f(z) that is analytic everywhere 
in some region R in the complex plane except at the point z = zp) which is a 
singularity of order m > 1. That is, 
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f(z) = fee (F8.1) 


where g(z) is analytic throughout R. Because g(z) is analytic we know it is ‘well- 
behaved,’ which is math-lingo for ‘all the derivatives of g(z) exist.’ (Take a look 
back at (F7.1) and the comment that follows it.) That means g(z) has a power series 
expansion about z = Zp and so we can write 


g(z) =co + ¢1(Z— Zp) + eo(z— Zp)? + 03(z— Z)° +... (F8.2) 


Putting (F8.2) into (F8.1) gives 


- en: oe C1 C2 
f( ) (z—z)™ (z—x)" 1 a (g—2)" 


Atera(a—agy ea 


or, as it is usually written, 


f(z) => jan(z— 2)" + ees e (F8.3) 


ea Z9)" 


where in the second sum all the b, = 0 forn > m. 

The power series expansion in (F8.3) of f(z), an expansion about a singular point 
that involves both positive and negative powers of (z — Zo), is called the Laurent 
series of f(z), named after the French mathematician Pierre Alphonse Laurent 
(1813-1854) who developed it in 1843. We can find formulas for the a, and b, 
coefficients in (F8.3) as follows. Begin by observing that if k is any integer 
(negative, zero, or positive), then if C is a circle of radius p centered on Zp (which 
means that on C we have z = Zp + pe’’), then 


2n 02 
folz = zo) ‘dz = | pXe™ ine! dO = okt f el(k+1)8q9 
) 0 


0 
S ipk*! ei(k+1)0 
i(k +1) 


As long as k # —1 this last expression is 0. If, on the other hand, k = — 1 our 
expression becomes the indeterminate . To get around that, fork = — 1 simply 
back-up a couple of steps and write 


2n 2n 
dee —z)~'dz= | Le ipe”’d0 = if dO = 2ni. 
Cc 0 pe 0 


That is, for k any integer, 


2 
2n _ pk! {eilk+1)04 is 
0 ~~ k+1 0° 
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0 k#—-1 
k > 
— dz = : F8.4 
fe-wha={ fe (F8.4) 


So, to find a particular a-coefficient (say, aj) in the Laurent series for f(z), simply 
divide through (F8.3) by (z — zo) * ' and integrate term-by-term. All of the integrals 
will vanish because of (F8.4) with a single exception: 


§ 0 dem f 9 te = 2H, 
c(z—2)" cZ— 20 


That is, 


f 
——! FR) a, j= 0,1, 2,35 2222 (F8.5) 
2ni Jo (z — 2)! 
And to find a particular b-coefficient (say, bj), simply multiply by (z — zw) ~ | 
through (F8.3) and integrate term-by-term. All of the integrals will vanish because of 
(F8.4) with a single exception: 


f tle) 2-20)! 'de= f (2-29) ld f *_ dz = 2nib;. 
fe Cc c2~ 20 


That is, 


_1 f{__ f(z) a 
"a hl ll i een (F8.6) 


One of the true miracles of contour integration is that, of the potentially infinite 
number of coefficients given by the formulas (F8.5) and (F8.6), only one will be of 
interest to us. That chosen one is b; and here’s why. If we set j = | in (F8.6) then”? 


fiileiae = 2nib,, (F8.7) 
c 


which is almost precisely (to within a factor of 2mi) the ultimate quest of our 
calculations, the determination of 


°° The contour C in (F8.7) has been a circle of radius p up to this point, but in fact by using the cross- 
cut idea of Fig. F7.2 we can think of C as being any contour enclosing zo such that f(z) is 
everywhere analytic on and within C (except at Zo, of course). 
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filer 


But, of course, we don’t do the integral to find b; (if we could directly do the integral, 
who cares about b,?!), but rather we reverse the process and calculate b; by some 
means other than integration and then use that result in (F8.7) to find the integral. 
The value of b, is called the residue of f(z) at the singularity z = Zp. 

What does ‘some means other than integration’ mean? As it turns out, it is not at 
all difficult to get our hands on b,. Let’s suppose (as we did at the start of this 
section) that f(z) has a singularity of order m. That is, writing-out (F8.3) in just a bit 
more detail, 


b 1) bm 
f(z) =... +a)(z—Z) + ag + + +...+——z: 
(2) i 0) Z—Zo (=m) (zZ—%) 
So, multiplying through by (z — Zo) gives 
(z—zo)" f(z) =... Fai(z zo)" 4 aq(z — 29)" + bi (z zm)! + bo(z 7%)? tb... + Dm. 


Next, differentiate with respect to z a total of m — | times. That has three effects: 
(1) all the a-coefficient terms will retain a factor of (z — Zo) to at least the first power; 
(2) the b; term will be multiplied by (m — 1)!, but will have no factor involving 
(Z — Zo); and (3) all the other b-coefficient terms will be differentiated to zero. Thus, 
if we then let z — z the a-coefficient terms will vanish and we’ll be left with nothing 
but (m — 1) ! by. Therefore, 


a = Wes _ {(z—z0)"£(z)} (F8.8) 


where Zp is a m-order singularity of f(z). 
For a first-order singularity (m = 1), the kind we’ll encounter with the probability 


integral, the formula in (F8.8) reduces, with the interpretation of a =lifm=1, 
to 


b; = lim (z — zo)f(z). 
ZZ 


Alternatively, write 
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where, as before, g(z) is analytic at the singularity z = Zp) (which is then, of course, a 
first-order zero of h(z)). That is, 


h(zo) =0. 
Then, 
om = tite ES te BO 
jim (z Zo)f(Z) = Jim Zo) (z) = jim h(z) — h(zo) 
Z—Zo 


where the denominator (which you no doubt recognize as the definition of the 
derivative of h(z)) on the far-right follows because h(z) = h(z) — h(zo) because h 
(Zo) = 0. So, 


im (z — Z)f(z — __8{%0) _ 
jim ( o)f( ) £h(2)|,_,, ; 


That is, the residue for a first-order singularity at z = Zo in the integrand f(z) = ee 


can be computed as 


b, = 220), (F8.9) 


(F8.9) is central to the evaluation of the probability integral in Chap. 12. 

To finish this section, Pll now formally state what we’ve been doing all 
through it: if f(z) is analytic on and inside contour C with the exception of N 
singularities, and if R; is the residue of the j-th singularity, then 


é f(z) dz=2ni S > Rj. (F8.10) 


j=l 


This is the famous residue theorem, that says, for each singularity, we pick-up a 
contribution to the integral of 2zi times the residue of that singularity, with each 
residue calculated according to (F8.8)—or (F8.9) if m = 1. That’s it! 

At least, that’s it for ws. There are more complications to contour integration, such 
as what to do about integrands that have singularities on the real axis where our 
contour partially exists, but none of those concerns occur in Chap. 12, and what we 
have developed here in Appendix F is all we will need to do the probability integral 
via contour integration. 
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Appendix G: The Fourier Transform Pair 


Suppose x(t) is a periodic, real-valued time function with period T. That is, for T 
some positive constant we have x(t) = x(t + T), — co <t < oo. We write the energy 
of x(t), W, in a period, as 


w= | x?(t) dt < oo. 
period 


This definition for the energy of a signal is motivated by an analogy from physics. If 
x(t) is a periodic voltage applied across a resistor R obeying Ohm’s law (if you don’t 
know what that is, don’t worry about it), resulting in the current i(t), then the 
instantaneous power is 


p(t) =x(i() =x(y 9) = © 29 


if R = 1 ohm. Since energy is the integral of power we have 


W= p(t) dt 


period 


and our definition of W as the integral of x*(t) follows, and this definition is used 

(by engineers, physicists, and mathematicians, alike) even if x(t) is not a voltage, 

even if the independent variable t isn’t time, and even if there isn’t a resistor in sight. 
The Fourier series”! of a periodic x(t) is the complex exponential series 


x(t) = oo Cpe’ Gg = i= v¥—1, (G.1) 


k=—oo 


where the c, are constants (generally, they are complex constants). (Using Euler’s 
identity to write the complex exponentials in terms of sines and cosines gives the 
common form of a Fourier series in terms of the trigonometrical functions.) How do 
we know we can express x(t) as in (G.1)? Suppose we write xn(t) as a truncated 
Fourier series with a finite number of terms, that is, for N some finite positive integer 


N 


Xn(t) = :S cxe 
k=—N 


Keot @o = 2n . 
T 


71 After the French mathematician Joseph Fourier (mentioned in the Preface). 
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Further, suppose we write Ey as the integrated-squared-error that x(t) makes as an 
approximation to x(t). That is, 


N 2 
- 7 [eo 7 av(OF fod 7 d. acto . 


k=—N 


En is the energy of the error. It can be shown” that if we pick the c, to be such that 
En is minimized, then: 
For any value of N, the resulting c, are independent of the value of N, and 


lim (min Ey) =0. 


N—- oo 


That is, the complex exponential Fourier series of x(t) tends toward having zero 
energy in the error made by xy(t) as an approximation to x(t) as we let N increase 
without bound. This result gives physical meaning to the complex exponential 
series, a series that, at first glance, may seem to be simply an abstract mathematical 
definition. 

It is easy to show that 


1 a — ik@ot 
C= = x(the “ dt. 
T period 


To see this, just multiply through (G.1) by e~ ‘"" and integrate over a period. You’ll 
see all the integrals vanish except for the case of n = k. Notice that while the c, are, 
in general, complex (since x(t) is real-valued), then the special case of k = 0 says 


rh 
C= a x(t) dt 
a period 


is always real-valued. The value of cp is, physically, the average value of x(t) over a 
period. Notice, too, that the conjugate of cy, cy, is (remember, x(t) is real) 


c= tf x(t)e~ oot ar} = if x*(t) (e~ Booty" dt 
period period 


he x(t)eXo dt—c_y. 


period 


We can use this result to show that, like co, all the other c, for k 4 0 also have 
important physical significance. We start by writing the integrand of the energy 
integral W as 


>? See The Science of Radio, Springer 2001, pp. 136-140, for the mathematical details. 
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00 00 00 
x(t) = 5 c efkoot _ , . CmCneint moot 
k= — oo m= —oo n= —oo 
and so 
00 0° oe) oe) 
W= 5 s CmCnelm moot qt = 5 S CmCn / ellm+n)oot qt, 
period m= —o0 n= — 00 m= —oo n=—oo period 


This last integral is particularly easy to evaluate. For any integration interval of 
duration one period, starting at the arbitrary time ¢, we have 


J 


vU+T 
stealth ei(mtn)aot ei(m+n)oo(t'+T) _ eilm+n)aot! 
e ‘0 t= 
t! 


~ im+n)oo) i(m + n)@o 
t! 


ei(m+n)aot! feeerayy _ 1} 


i(m + n)@o 
Now, recall from (G.1) that @pT = 2x, and that e“™ *"* = | for all integer m and n. 
Thus, the integral is almost always equal to zero. But not always, because for the 


special cases where n = — m the last expression becomes the indeterminate 7 . For 
those special cases, first set n = — m in the energy integral and then do the integral: 


‘ ot +T 
/ e° dt= / dt=T. 
period t’ 


Thus, 


W= 3 CmC_ ml =T 3 CmC, =T y \Cm|? =T +2 S> lal? 


m= —-® m= —oco m= —® 


or, dividing through by T to get the average power of x(t) over a period, we have 
(changing the dummy summation index back to k) 


WwW ~< 2 
T= +2 Del, 
k=1 


a result called Parseval’s theorem.”* 


?3 After the French mathematician Marc Antoine Parseval des Chenes (1755-1836). This result 
(in remarkably different form from what we’ve developed here) dates from 1799. 
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Parseval’s theorem has an elegant physical interpretation. As a preliminary 
comment to developing that interpretation, the reciprocal of the period, b is called 
the fundamental frequency of x(t). Combining the c_, and c, terms of the Fourier 
series of x(t), using Euler’s identity, we see that 


= ‘ F * F F * 
c_xe ikaot bits cpeiKoot = ce (ee) 4 cpeKoot = (cpe™*) zi cpeKoot 


=2Re{cxe"} = 2[a,.cos(kaot) — by sin(kaot)] 


where a; and b, are some real-valued constants such that c, = a, + iby. This is the 
trigonometric form of Fourier series I mentioned earlier. By elementary trigonom- 


etry, this represents a sinusoid with amplitude ,/a? + b; at frequency k@. This 


sinusoid is called the k” harmonic of x(t). So, the average power of x(t) is the sum of 
the zero frequency power (cd) and all the individual powers of the harmonics of x(t); 
the average power of the k" harmonic is 2\c,!*. This physical interpretation means 
that all our math is not simply abstract symbolism, but instead has a strong connec- 
tion to the real world. 

So far, we’ve thought of x(t) as periodic. What if it isn’t? We then can’t, it would 
seem, write x(t) as a Fourier series, right? Well, maybe we can, if we extend what we 
mean by the ‘period’ of a periodic signal. What if we say a non-periodic x(t) is 
periodic, with an infinite period? We just happen to be observing x(t) in its ‘current 
period’! This is, of course, nothing more or less than a sneaky trick, but it leads to 
something of enormous value. If we let T — oo in the Fourier series math, it can be 
shown”" that we get what is called the Fourier transform pair: 


X(@) =F{x(t)} = ‘ x(t)e dt (G.2) 
x(t) =F~'{X(o)} = x / ” x(a)el*do. (G.3) 


In these equations X(@) is the Fourier transform of x(t), and the second equation 
shows that we can recover x(t) from X(@), a claim written as x(t) + X(@). The 
second equation of the pair is called the inverse Fourier transform. As was (perhaps) 
the case for the Fourier series, these equations may at first look abstractly mysteri- 
ous, but there is real, physical significance to the Fourier transform. 

With what we’ve already done as inspiration, let’s define the energy of x(t) as 


W= f “x (0d 


and so we have 


4 The Science of Radio, pp. 168-170. 
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w= / “_x(dx(at= / “at 7 | X()e!'do| dt 
= / . = X(0) | / “_x(pemat do 
eT ; 


= / 3g )X" (@)do = / OK 


That is, we can calculate the energy of x(t) either in the t-domain or in the w-domain 
(getting the same answer, of course!). This result, 


7 ioe) ; _ oe) 1 ; 
w= fx (yar or |[X(@)|"do, 


— oo 


is called the Rayleigh/Plancherel energy theorem; it is the non-periodic equivalent 
of Parseval’s theorem for periodic signals. Here is an example of its use in a purely 
mathematical setting. Suppose 


e', O0<t<l1 
0, otherwise 


The energy of x(t) is 


1 =f 
we | eMat= (: ) 
=o 


The Fourier transform of x(t) is 


1 : 1 : e7 (Itia)t 
X(o) =| ete gt = i e” (I+io)tge = { = al = art 
0 Jo u 


which, with a little algebra, becomes 


1 
0 


1 


0 


1— 4 cos() + i4 sin() 
AO) 1+ io 


So, 


°5 After Lord Rayleigh (remember him from the Epilogue in Chap. 1?) who stated the theorem in 
1889, and the Swiss mathematician Michel Plancherel (1885-1967) who, in 1910, rigorously 
established the conditions under which the theorem is true. 
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2 {1-4 cos(w) }” + {i sin(w) }” 


X(o)| ao 


or, again with just a bit of algebra, 


1+4 —2cos() 
2 e e 
pie)? te 3 ele) 


Now, from the energy theorem we can therefore write 


1 felt + — 2 cos(a) _e-l 
on 1+@ a De 


— oo 


or, with just a bit more algebra, we arrive at 


[- cos(o) | op do m(e* — 1) 


wilto”  % | ..1+e 2e 
Since 

oe | a Foe a T\ 

/ 27 a) =a) t 


then we have the beautiful conclusion (involving two of the most important con- 
stants in mathematics) 


oo 1 +o oO Fe 2e e" 


a cos() 2 n(e*—1) x 


This is an integral that often appears in textbooks on the theory of complex variables, 
as an illustration of doing a contour integral in the complex plane—which is the way 
we do it more generally in Sect. F7 of Appendix F (our result here is the result in 
Appendix F for the special case a = b = 1). 


Appendix H: Monte Carlo Simulation of the Buffon Needle 
Problem 


For the analysis of the Buffon Needle Problem (introduced in Chap. 1) that you’ll 
read here, let’s assume that the needle length (which was given in Chap. | as being 
equal to the line spacing”®) is 1 as measured in some system of units. This involves 


©The more general version of the needle problem allows the length of the needle to be any positive 
value relative to the line spacing. This elaboration is not difficult; see, for example, Prakash 
Gorroochurn, Classic Problems of Probability, Wiley 2012, pp. 159-168. 
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Fig. H1 A needle of length 1 ‘just touching’ the upper parallel ruled line 


no loss of generality because we are free to choose whatever units of distance we 
wish, and the probability of the needle crossing a ruled line shouldn’t depend on an 
arbitrary choice of units, After all, systems of units are nothing more than historical 
accidents! Now, with that settled, realize as we start that we must first consider what 
it means to say the needle lands on the ruled surface ‘at random.’ 

To answer that question, concentrate your attention on an arbitrary pair of 
adjacent parallel lines, between which the mid-point of the needle eventually 
comes to rest after a toss. If we imagine a mid-line (the dotted line in Fig. H1) 
between the two parallel lines, we'll take random to mean that the needle’s 
mid-point is as likely to be above the mid-line as it is to be below the mid-line. 
Let’s first consider the case of the needle’s mid-point being above the mid-line. If we 
write the distance the mid-point of the needle is above the mid-line as Y, then we’ll 
take Y to be a uniformly distributed random variable from 0 to 5: (Y = 0 is the 
mid-line.) This, of course, does not give us a complete description of a randomly 
positioned needle. To complete that description, we also need to look at the angle @ 
the needle makes with the mid-line. To that end, we’ll take 6 as a uniformly 
distributed random variable (independent of Y) from 0° to 180° (0 to z radians).7’ 

Figure H1 shows a needle ‘lying at random’ just on the verge of crossing the 
upper parallel ruled line (the top end of the needle, E2, is on the upper parallel ruled 
line). If the needle’s angle is increased from the critical value of 8 = 0, we see that E2 
moves upward and the needle crosses the upper ruled line. From the figure we have 


or, for a needle on the verge of crossing the upper ruled line, 


?7Students often asked, when I discussed this problem in lecture, why 0 isn’t taken as being 
uniformly distributed from 0° to 360°? Think about that for now, and I’ll return to the question 
later in the analysis. 
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Needle crossing (v > + = ” sin (0) 


= 


1.1. 
|— Y= 7-7 sin (8) 


q Tt 
2 


Fig. H2 The sample space for the Buffon Needle Problem 


1 1 
og 
We can use this to assert the following: given that the mid-point of the needle falls 
above the mid-line then, if the needle’s angle is 0, the needle crosses the upper ruled 
line if 

Vs = aie). (H.1) 
2 2 
(Try some particular values of 0, like O° or 90°, and confirm that the resulting 
inequality for Y makes physical sense.) 

The sample space of the Buffon Needle Problem is shown in Fig. H2, with the 
shaded-in region containing all the points in sample space that represent the needle 
crossing the upper ruled line. Using the same geometric probability argument that we 
discussed in the Ed and Gail meeting problem of Chap. | (that is, by associating an 
area with a probability), we have the conditional probability that the needle, given that 
its mid-point is above the mid-line, crosses the upper ruled line is 


®x- [@-1sney}o 3-1 [ (1— sin(ey}a0 i 
7 ; = — =1 (8 + cos(6)] 
(3) 2 e 
0 
=1—=[n+ cos(n) — 0 — cos(0)] =1 ‘in ffi (1 2) ==, 


°8To compute the area of the shaded region in Fig. H2 (the region in sample space where inequality 
(1) is satisfied), I subtracted the area under the Y-curve from the total area of the sample space. 
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If you repeat this entire argument for the case of the mid-point of the needle below 
the mid-line (and so, if the needle crosses a ruled line, it would be the Jower ruled 
line) you'll get exactly the same result. Or, more easily, just invoke symmetry. So, 
we have the conditional probability that the needle, given that its mid-point is below 
the mid-line, is 2 Just as with the Ed-and-Gail problem, we convert these two 
conditional probabilities into the probability of the needle crossing a ruled line by 
taking the average of the two conditionals. Since the conditionals are equal, we have 
the probability of a randomly tossed needle (of length equal to the ruled-line spacing) 
crossing a line is 2 just as calculated by Laplace. 

To write a Monte Carlo simulation of the Buffon Needle problem is very easy. 
From this point-on let’s measure vertical distance from the lower ruled line. All we 
have to do, after we pick random values for the y-coordinate of the needle mid-point 
(a value between 0 and 1) and the needle angle (a value between 0 and z) is calculate 
the y-coordinate of the needle’s ends (points El and E2 in Fig. H1). It is easy to see 
from that figure (remember, we are now measuring Y from the /ower ruled line, not 
from the mid-line) that 


Ypi =Y- ; sin(®), Yeo = Y + ; sin(0). 

If Yp; < 0 orif Y~> > | then the needle crosses a ruled line. A Monte Carlo code for 
doing this, for 100 million tosses of the needle, buffon.m, is in the following box. 

This code is, I think self-explanatory, with the single exception of the variable pi. 
In MATLAB the variable pi has the a priori value of the constant 3.14159265. .. . If 
you are writing code in a language which does not have that convention, then simply 
type pi = 3.14159265 in the first line. Following Laplace’s suggestion, since 2is the 
theoretical value for the code’s estimate for the probability of a crossing (which is 


ooums*), then the code estimates z as sPilogp » Running the code several times (10 s a 
loop crossings 


run) gave estimates for x ranging from 3.1416 to 3.1421 which seems (to me) to be a 
poor return for 100 million tosses.”° 


sbuffon.m 

crossings=0; 

for loop=1:100000000 
Y=orend> 


(continued) 


?° Okay, do you have an answer to the question posed in note 2? That is, why is the angle of the 
needle taken to be in the interval 0 < 8 < mand not 0 < 8 < 2x? In fact, either choice is okay! That’s 
because, after rotating the needle through x radians, the needle maps into itself (that is, the physical 
appearance of the needle for 0 < 0 < mis identical to its appearance for m < 8 < 22. So, while using 
0 <0 < 2m gives us twice the number of sample space points that represent a crossing as does using 
0 <0 <r, it doesn’t matter for computing the probability of a crossing because the sample space for 
0 < 0 < 2mis twice as large as the sample space for 0 < 0 < a. 
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A=pi*rand; 
CSeulia (A) /2 p 
Wi NaIEP 
WAS NAEP 
SLE Wal (0) 
crossings=crossings+1; 
else 
ase WD SaL 
crossings=crossings+1; 
end 
end 
end 
2* loop/crossings 


A serious philosophical/logical objection to buffon.m is that it assumes prior 
knowledge of the value of 1, the very quantity we are attempting to estimate! That 
has the feeling of circular reasoning to it. Fortunately, however, it is quite easy to 
describe a modified physical process of randomly tossing something onto a ruled 
surface that completely avoids this objection. It also has the virtue of using only high 
school geometry (no calculus). Imagine Buffon’s array of parallel, straight lines with 
uniform spacing d, together with a similar array of straight lines at right angles to the 
original array. The two arrays create a square tiling of the surface, as shown in 
Fig. H3. Inside each square imagine we have inscribed a circle. Finally, instead of 
tossing a needle we now randomly toss a ‘point object,’ like a BB or a pea or a tiny 
grain of sand. 


Fig. H3 Estimating x 
without already knowing 1 
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If we construct a rectangular co-ordinate system, with its origin X = Y = 0 as 
shown in Fig. H3, then the side-length of the square is d and the radius of the circle is 
q. The probability the point-object lands inside the circle is the ratio of the circle’s 
area to the area of the square. That is, the probability is 


tm 


d 
ae 3 
da 


? 


fia 


a result independent of d.*° If we take d = 2, the values of the independent X and 
Y are each in the interval —1 to +1. So, for each toss we generate a pair of random 
numbers (the values of X and Y) and if X? + Y* < 1 the point object is inside the 
circle. The Monte Carlo code nopiehere.m carries out this process 100 million times 
(note there is no x in the code, and hence the name). 

This code is ‘simpler’ than that of buffon.m and, in fact, executes twice as fast. 
But, alas, it is just as lacking in performance, with estimates for pi ranging over the 
interval 3.1414 to 3.1418. 


Snopiehere.m 
Exiscolo=0)p 
for loop=1:100000000 
X=-1+2*rand; 
Y=-1+2*rand; 
if X*°24+Y*2<1 
circle=circle+l; 
end 
end 
4*circle/loop 
crossings=0; 
for loop=1:100000000 
VY=rand; 
A=pi*rand; 
C=culia VA) (2p 
VANE p 
Y2=Y+C; 
if Y1<0 
crossings=crossings+1; 
else 
LE ¥2s1 
crossings=crossings+1; 
end 
end 
end 
2*loop/crossings 


3°The z-less estimation of pi is discussed (at a mathematical level a bit beyond that of this book) in 
Nathan Mantel, “An Extension of the Buffon Needle Problem,” The Annals of Mathematical 
Statistics, December 1953, pp. 674-677. 


186 Appendices 


Estimating 2 via Monte Carlo methods doesn’t appear to be very promising and 
so (I think) mathematicians have little reason to worry about needle/pea/BB/sand- 
tossing physicists encroaching on their territory. In fact, mathematicians and com- 
puter scientists have already exactly calculated x to an astonishing number of digits. 
As I write (mid-2023), pi is known to more than 62.8 trillion (that’s not a typo!) 
digits. Such a heroic calculation is not an idle effort either, as all those digits allow 
mathematicians to explore the subtle structure of pi. To see what I mean by that, use 
your favorite search engine to ask the question ‘is pi normal?’ What you’ll get back 
will explain the reference to the research of the fictional Professor Falk in note 7 of 
the Preface. 
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